|

- 积分
- 0
- 威望
- 0
- 包包
- 21
|
PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES$ e& t( t" W; R$ v
Abstract K. \6 D0 A( k% t
Predicting the secondary structure of proteins is important in biochemistry because the 3D
( z' a k4 e* Sstructure can be determined from the local folds that are found in secondary structures.: ^. `- \+ i8 e3 D" D+ [# ?
Moreover, knowing the tertiary structure of proteins can assist in determining their functions.2 M# \! q& q* S7 T6 X3 x( E M
The objective of this thesis is to compare the performance of Neural Networks (NN) and
( O( c8 {, I) tSupport Vector Machines (SVM) in predicting the secondary structure of 62 globular proteins
, K1 [6 Q0 `6 l! ]5 _- g% bfrom their primary sequence. For each NN and SVM, we created six binary classifiers to9 _" M% h+ k9 E2 h
distinguish between the classes’ helices (H) strand (E), and coil (C). For NN we use Resilient% I7 D/ Q; Q& Y" O" m
Backpropagation training with and without early stopping. We use NN with either no hidden
1 t. J8 ^" }1 u% R; wlayer or with one hidden layer with 1,2,...,40 hidden neurons. For SVM we use a Gaussian
. T i1 s$ K; Q' L2 I6 Fkernel with parameter fixed at
' i- B6 w* H% ]7 `8 q; k = 0.1 and varying cost parameters C in the range [0.1,5]. 10-
9 W& L5 [# V3 C! ifold cross-validation is used to obtain overall estimates for the probability of making a correct. S- v" N D; P3 N% t: C' B) Z
prediction. Our experiments indicate for NN and SVM that the different binary classifiers
1 F) e; m% @% {6 g& Y. ^have varying accuracies: from 69% correct predictions for coils vs. non-coil up to 80% correct
/ m" X: N0 a. X, H. J) |predictions for stand vs. non-strand. It is further demonstrated that NN with no hidden layer) N" V7 R1 [7 \
or not more than 2 hidden neurons in the hidden layer are sufficient for better predictions. For
8 G( u, _% l2 D# V# }- _ d0 w1 cSVM we show that the estimated accuracies do not depend on the value of the cost parameter.: V! W' {* y- g I3 f: u' h6 Z. b
As a major result, we will demonstrate that the accuracy estimates of NN and SVM binary
# D$ {6 k" d7 \% N$ ` s8 ?classifiers cannot distinguish. This contradicts a modern belief in bioinformatics that SVM
, l# s. z" _- |% F" j6 @$ p- coutperforms other predictors.
: V1 p5 ^) i9 ^ Ekeywords: Neural Networks, Support Vector Machines, Protein Secondary Structure Prediction |
|