
- 积分
- 50
- 威望
- 50
- 包包
- 541
|
本帖最后由 细胞海洋 于 2010-7-20 15:26 编辑 % I2 ^3 ]1 D' [& ^: |
: I- F5 \! [0 r) S) p+ kModern biology has become an information science. Since the invention of a4 F$ h4 D- l. N i$ q( o+ A) {
DNA sequencing method by Sanger in the late seventies, public repositories7 Q2 N. ?- M5 z( C0 N
of genomic sequences have been growing exponentially, doubling in size every
1 Z5 |) G, S. q! j% U16 months—a rate often compared to the growth of semiconductor transistor) [6 H+ m4 o4 T0 c- P+ ]
densities in CPUs known as Moore’s Law. In the nineties, the public–private1 ~; w: Q4 J7 r0 O1 V
race to sequence the human genome further intensified the fervor to gener-
. N; Q: R8 {& R2 G0 Qate high-throughput biomolecular data from highly parallel and miniaturized
+ P& v7 W7 x; I& ]& sinstruments. Today, sequencing data from thousands of genomes, including
8 `4 t* X) h( Bplants, mammals, and microbial genomes, are accumulating at an unprece-, U: D: ?5 p D3 t) z6 r
dented rate. The advent of second-generation DNA sequencing instruments,2 I/ d4 y1 } t( ?& F; D* [, n
high-density cDNA microarrays, tandem mass spectrometers, and high-power
2 o" T! z" Q: m2 n( B9 W" ]NMRs have fueled the growth of molecular biology into a wide spectrum of% g1 H' P5 U* }! F) ?6 t# K
disciplines such as personalized genomics, functional genomics, proteomics,: P# W: I) k9 o' P. X8 B q
metabolomics, and structural genomics. Few experiments in molecular biol-. a+ H) l3 G4 j# {% y- D
ogy and genetics performed today can afford to ignore the vast amount of$ v* i. h0 K1 a3 f! A- |
biological information publicly accessible. Suddenly, molecular biology and
$ X+ q* S/ |, p7 qgenetics have become data rich.
( e3 ~$ U! Q6 q. `Biological data mining is a data-guzzling turbo engine for postgenomic
7 F; c6 m6 i1 E& X nbiology, driving the competitive race toward unprecedented biological discov-5 Z* H* F# D i: b# {3 D
ery opportunities in the twenty-first century. Classical bioinformatics emerged
- ?7 f2 l/ {: _) F+ C( Lfrom the study of macromolecules in molecular biology, biochemistry, and
m. D: t0 C, W( J4 G Bbiophysics. Analysis, comparison, and classification of DNA and protein se-
3 b) U' p6 }# I9 X) a/ S6 wquences were the dominant themes of bioinformatics in the early nineties.
' s# I1 ^2 t y6 q" T. \4 r4 IMachine learning mainly focused on predicting genes and proteins functions( [) K$ ~" F/ j
from their sequences and structures. The understanding of cellular functions7 j) a& h2 _- }) m1 ?/ r/ j
and processes underlying complex diseases were out of reach. Bioinformatics
' P4 t- ^+ ]* P; M. U ?# p xscientists were a rare breed, and their contribution to molecular biology and% ?$ D' d; R |: r; z P
genetics was considered marginal, because the computational tools available( D* h5 N$ n' V# [; P ~
then for biomolecular data analysis were far more primitive than the array
$ f3 `5 W5 c. l# pof experimental techniques and assays that were available to life scientists.3 i# y! D! q3 o, G, d3 U( ]5 c& N5 j
Today, we are now witnessing the reversal of these past trends. Diverse sets
" Q; y1 v, \# l! y5 Z9 Iof data types that cover a broad spectrum of genotypes and phenotypes, par-
; A" Z) A+ P. V Eticularly those related to human health and diseases, have become available.1 o2 X$ B5 |' [
Many interdisciplinary researchers, including applied computer scientists, ap-3 H3 @4 w0 b9 g* x3 I
plied mathematicians, biostatisticians, biomedical researchers, clinical scien-
. Y8 m5 e, @/ M8 @2 ?, rtists, and biopharmaceutical professionals, have discovered in biology a goldmine of knowledge leading to many exciting possibilities: the unraveling of the
0 Q g" t- `# \ ~7 h: o' {* z' ~tree of life, harnessing the power of microbial organisms for renewable energy,
- v7 d+ h1 P' l; E& Afinding new ways to diagnose disease early, and developing new therapeutic
2 A* \7 O1 H2 R" j( c! Tcompounds that save lives. Much of the experimental high-throughput biology
# K/ X5 _8 D: z- K& F. adata are generated and analyzed “in haste,” therefore leaving plenty of oppor-
+ }6 P! K/ U# @2 K: `tunities for knowledge discovery even after the original data are released. Most
, v1 D/ Q( J; A3 j& W$ Qof the bets on the race to separate the wheat from the chaff have been placed( \9 H. n& B9 w( r5 t
on biological data mining techniques. After all, when easy, straightforward,. }/ E, @2 }" k2 _# Z9 h; R
first-pass data analysis has not yielded novel biological insights, data mining
- h( w; B$ S9 @; [0 t$ p0 r H2 E5 btechniques must be able to help—or, many presumed so.
( V6 ^( ~6 c6 ~0 |! D0 O) x
8 y# B5 M8 D+ ?/ k6 p3 E! W[hide][/hide] |
附件: 你需要登录才可以下载或查看附件。没有帐号?注册
-
总评分: 威望 + 5
包包 + 10
查看全部评分
|