- 积分
- 0
- 威望
- 0
- 包包
- 21
|
PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES
$ L5 n! D) y D) c9 t+ uAbstract( c! P g: \1 M4 s/ k& A
Predicting the secondary structure of proteins is important in biochemistry because the 3D, k+ Q( k5 z2 D6 j: {+ b H
structure can be determined from the local folds that are found in secondary structures.
: w( N/ ~/ ?5 ]) d1 \/ _4 ~/ s/ n8 cMoreover, knowing the tertiary structure of proteins can assist in determining their functions." ]% ^/ a4 E) `- O
The objective of this thesis is to compare the performance of Neural Networks (NN) and- C) K8 {0 t" Y5 g# j1 b, `9 f
Support Vector Machines (SVM) in predicting the secondary structure of 62 globular proteins; H/ L4 I8 t7 s2 i5 K* x
from their primary sequence. For each NN and SVM, we created six binary classifiers to& [$ }, Z$ z3 S5 v5 [8 F" o
distinguish between the classes’ helices (H) strand (E), and coil (C). For NN we use Resilient! r: l* L$ o7 h6 F
Backpropagation training with and without early stopping. We use NN with either no hidden
0 Z. a, {5 |8 G3 S% U3 rlayer or with one hidden layer with 1,2,...,40 hidden neurons. For SVM we use a Gaussian
* W# t' Y, `0 |, d( |9 xkernel with parameter fixed at 3 _' m I! C! K9 Z+ Q- F
= 0.1 and varying cost parameters C in the range [0.1,5]. 10-0 H# ?$ ?. Y j$ A
fold cross-validation is used to obtain overall estimates for the probability of making a correct
: x: m( t# q; Sprediction. Our experiments indicate for NN and SVM that the different binary classifiers! q! y+ V) H1 R
have varying accuracies: from 69% correct predictions for coils vs. non-coil up to 80% correct; Z$ p, `- ?, X) J8 g
predictions for stand vs. non-strand. It is further demonstrated that NN with no hidden layer) R/ C, m8 k! Y
or not more than 2 hidden neurons in the hidden layer are sufficient for better predictions. For
: w! [- ?- V" z$ O$ x1 [4 E$ ^SVM we show that the estimated accuracies do not depend on the value of the cost parameter.
! _ L4 _, S- JAs a major result, we will demonstrate that the accuracy estimates of NN and SVM binary
) p( K5 K- a/ z( a9 H6 K! _classifiers cannot distinguish. This contradicts a modern belief in bioinformatics that SVM
' }5 U7 V( U* r6 z, q! u, Ioutperforms other predictors.
3 f8 ~0 Y) |% J: N( L0 vkeywords: Neural Networks, Support Vector Machines, Protein Secondary Structure Prediction |
|