
- 积分
- 50
- 威望
- 50
- 包包
- 541
|
本帖最后由 细胞海洋 于 2010-7-20 15:26 编辑 - j' H0 ]- I% ~
: P" U. f# Q; I% K& m7 F
Modern biology has become an information science. Since the invention of a) Y9 @$ r9 K8 x( N! t7 A: l; t
DNA sequencing method by Sanger in the late seventies, public repositories
) P9 I0 |% A& O) F+ `of genomic sequences have been growing exponentially, doubling in size every
5 a( {- u. l5 K/ N7 Z16 months—a rate often compared to the growth of semiconductor transistor
* q2 H. J8 Q- xdensities in CPUs known as Moore’s Law. In the nineties, the public–private
8 E- J* f9 G; ?6 U2 Nrace to sequence the human genome further intensified the fervor to gener-$ Q3 ]1 W; R3 @' e2 G' J9 e
ate high-throughput biomolecular data from highly parallel and miniaturized
[. a# @7 g6 W$ minstruments. Today, sequencing data from thousands of genomes, including
- l) K/ z! ^' j: aplants, mammals, and microbial genomes, are accumulating at an unprece-
. m: j6 }+ j" B7 S4 Z) J; B* Cdented rate. The advent of second-generation DNA sequencing instruments,
1 d( u7 ?0 v9 ^# {* a2 qhigh-density cDNA microarrays, tandem mass spectrometers, and high-power( E' H% n0 G! M5 v5 [- G
NMRs have fueled the growth of molecular biology into a wide spectrum of
4 v& S& {! c2 d+ I- _; Gdisciplines such as personalized genomics, functional genomics, proteomics,6 E7 h N: G# g* y% A; h
metabolomics, and structural genomics. Few experiments in molecular biol-% n' J" z+ y8 }0 K# z, b# s
ogy and genetics performed today can afford to ignore the vast amount of
Y4 J2 u$ G% F2 |7 Tbiological information publicly accessible. Suddenly, molecular biology and
. b' R, I! T1 j7 {# R* V4 }1 ggenetics have become data rich.6 s8 _6 S, Q8 d5 t# Z
Biological data mining is a data-guzzling turbo engine for postgenomic
2 w5 e: u6 H+ x, rbiology, driving the competitive race toward unprecedented biological discov-2 l# j" Z) W9 G& T4 C
ery opportunities in the twenty-first century. Classical bioinformatics emerged
3 |4 Q8 f! ?* ^* W# v' Mfrom the study of macromolecules in molecular biology, biochemistry, and
! i6 J$ d$ w0 g+ j+ R3 b/ Rbiophysics. Analysis, comparison, and classification of DNA and protein se-
# K) v3 m- ?, ]: o5 f# oquences were the dominant themes of bioinformatics in the early nineties.
/ [/ ?# I, ^5 Y" c) V/ V3 pMachine learning mainly focused on predicting genes and proteins functions9 @3 @1 q0 S9 o/ t* X
from their sequences and structures. The understanding of cellular functions( r4 [' e; F2 [+ d+ G; i
and processes underlying complex diseases were out of reach. Bioinformatics, U5 A$ E+ I( y
scientists were a rare breed, and their contribution to molecular biology and% v( G; Q) S9 W
genetics was considered marginal, because the computational tools available
/ b# b) N2 M+ Z4 w4 O; }- F. ?then for biomolecular data analysis were far more primitive than the array
! c+ P5 M4 c, ^ n3 P Kof experimental techniques and assays that were available to life scientists.
& U2 x- [/ A" S @' `$ g3 UToday, we are now witnessing the reversal of these past trends. Diverse sets
+ E0 I+ [. F$ Z, H0 y8 m: M) Zof data types that cover a broad spectrum of genotypes and phenotypes, par-# o5 F" \) e9 I2 e: t; }) \
ticularly those related to human health and diseases, have become available.
2 r0 t: `% v( z& D% f4 v8 Z% OMany interdisciplinary researchers, including applied computer scientists, ap-
5 W' \! ^. J: V* Lplied mathematicians, biostatisticians, biomedical researchers, clinical scien-
/ H! ]1 H _9 x' j3 htists, and biopharmaceutical professionals, have discovered in biology a goldmine of knowledge leading to many exciting possibilities: the unraveling of the' c) d1 P5 w y% d
tree of life, harnessing the power of microbial organisms for renewable energy,4 u& a# Z0 Q) }
finding new ways to diagnose disease early, and developing new therapeutic
- r, J- r4 S: _, S5 Ucompounds that save lives. Much of the experimental high-throughput biology" m7 X$ Z" W u
data are generated and analyzed “in haste,” therefore leaving plenty of oppor-4 F2 ]$ v; G# p0 R! |
tunities for knowledge discovery even after the original data are released. Most/ Q' i* `( O; G9 s* }' |5 l
of the bets on the race to separate the wheat from the chaff have been placed# m/ E! |1 o" w$ u {' h
on biological data mining techniques. After all, when easy, straightforward,
3 _- x: }# |' Q$ @$ nfirst-pass data analysis has not yielded novel biological insights, data mining
5 z' K6 O5 p6 h, m# F- ttechniques must be able to help—or, many presumed so.# w4 i6 K' L' a3 u
! L7 |3 M2 G: j1 } }[hide][/hide] |
附件: 你需要登录才可以下载或查看附件。没有帐号?注册
-
总评分: 威望 + 5
包包 + 10
查看全部评分
|