
- 积分
- 50
- 威望
- 50
- 包包
- 541
|
本帖最后由 细胞海洋 于 2010-7-20 15:26 编辑 0 U$ P, s( S: e. |
u3 I, [# t) T8 K: O. KModern biology has become an information science. Since the invention of a
: k! ~ N9 e! Z4 D0 a) ADNA sequencing method by Sanger in the late seventies, public repositories
, a7 g+ P* b4 B! b7 @of genomic sequences have been growing exponentially, doubling in size every J6 _ H y- I1 J% S9 E
16 months—a rate often compared to the growth of semiconductor transistor
4 n+ _6 |: u3 A/ X) _- ?7 j) jdensities in CPUs known as Moore’s Law. In the nineties, the public–private
/ @9 Z$ d Y' S3 w0 u! Prace to sequence the human genome further intensified the fervor to gener-
9 T" v7 U% }1 ~# V4 ?: h, t; d/ I9 ]ate high-throughput biomolecular data from highly parallel and miniaturized
/ N# _6 K! P/ J) l2 M7 O+ X& k" L4 g( kinstruments. Today, sequencing data from thousands of genomes, including
, [+ q9 w3 O$ Q4 w6 wplants, mammals, and microbial genomes, are accumulating at an unprece-
/ }- D( r+ B) L, Q" ~9 Rdented rate. The advent of second-generation DNA sequencing instruments,
7 P) s1 [+ r% Y) [2 e b- Ihigh-density cDNA microarrays, tandem mass spectrometers, and high-power
) Q$ V8 f4 [: c/ \NMRs have fueled the growth of molecular biology into a wide spectrum of- P2 w( E6 U) K, @ b6 X- I! ~" w
disciplines such as personalized genomics, functional genomics, proteomics,+ l+ G8 e6 m4 D' X
metabolomics, and structural genomics. Few experiments in molecular biol-# b* [: h1 a& [: n, f& s
ogy and genetics performed today can afford to ignore the vast amount of$ U0 n$ q8 S3 g; K
biological information publicly accessible. Suddenly, molecular biology and2 q) {, q5 c/ y# l7 V
genetics have become data rich./ n8 C' u0 i! Z+ `! h" K
Biological data mining is a data-guzzling turbo engine for postgenomic
6 H3 ]2 \! s" c0 Wbiology, driving the competitive race toward unprecedented biological discov-+ u9 K8 \& T+ s& z+ u, g
ery opportunities in the twenty-first century. Classical bioinformatics emerged
4 ^! }( h1 \3 F+ `9 L+ l& h' B6 T! dfrom the study of macromolecules in molecular biology, biochemistry, and! r2 t1 b- G: r Z+ c
biophysics. Analysis, comparison, and classification of DNA and protein se-# C+ C4 g4 F) ^& i, l5 l
quences were the dominant themes of bioinformatics in the early nineties.
9 T. d. d7 V+ i. g0 }) ]Machine learning mainly focused on predicting genes and proteins functions' O& w: S1 ~3 ]! O' x8 b
from their sequences and structures. The understanding of cellular functions
0 Y$ ^1 {3 J3 \+ q L6 j) i( S, |and processes underlying complex diseases were out of reach. Bioinformatics6 L7 H: [& j& ^
scientists were a rare breed, and their contribution to molecular biology and
V. t1 ?- r& \, D# D8 Q, [- T. mgenetics was considered marginal, because the computational tools available
& e2 {" `* I% w' Sthen for biomolecular data analysis were far more primitive than the array
* Y8 H( V) Q! j$ S0 K! Dof experimental techniques and assays that were available to life scientists.0 C0 j* \% ^! a! s5 f5 C. J' [
Today, we are now witnessing the reversal of these past trends. Diverse sets$ O1 |- f! L1 E( \( {' h, @8 v
of data types that cover a broad spectrum of genotypes and phenotypes, par-4 e! G6 f9 L. b3 s
ticularly those related to human health and diseases, have become available." r# m6 f9 O& L2 u6 N
Many interdisciplinary researchers, including applied computer scientists, ap-) t5 E2 J8 |8 A: r% B
plied mathematicians, biostatisticians, biomedical researchers, clinical scien-! q+ ^, S ~* x, f2 C
tists, and biopharmaceutical professionals, have discovered in biology a goldmine of knowledge leading to many exciting possibilities: the unraveling of the
! o+ s% ~! l. [: e7 ~$ Ytree of life, harnessing the power of microbial organisms for renewable energy,
! U8 ^! U+ O0 T( T) afinding new ways to diagnose disease early, and developing new therapeutic
% @# G6 X! X) z7 d" Q* A& _compounds that save lives. Much of the experimental high-throughput biology* C) m) W9 `( o u% t
data are generated and analyzed “in haste,” therefore leaving plenty of oppor-
# V$ }# `) Z: _ `0 ~+ u" {tunities for knowledge discovery even after the original data are released. Most. Q4 c6 r! {1 E' G5 n# u4 Y
of the bets on the race to separate the wheat from the chaff have been placed- U% I1 I" c% {" G
on biological data mining techniques. After all, when easy, straightforward,/ \0 } s( K" [. z3 d
first-pass data analysis has not yielded novel biological insights, data mining3 g: W, j+ j p- t. M
techniques must be able to help—or, many presumed so.
4 d' k$ m, \ G; y/ }- b" O9 Y) w6 P Z; A
[hide][/hide] |
附件: 你需要登录才可以下载或查看附件。没有帐号?注册
-
总评分: 威望 + 5
包包 + 10
查看全部评分
|