SVM-cabins: Prediction of solvent accessibility using accumulation cutoff set and support vector machine

被引:26
作者
Wang, Jung-Ying
Lee, Hahn-Ming
Ahmad, Shandar [1 ]
机构
[1] Jamia Millia Islamia, Dept Biosci, New Delhi 110025, India
[2] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
[3] Lunghwa Univ Sci & Technol, Dept Multimedia & Game Sci, Tao Yuan 333, Taiwan
[4] Acad Sinica, Inst Sci Informat, Taipei 115, Taiwan
[5] Natl Inst Biomed Innovat, Osaka, Japan
关键词
relative solvent accessibility; protein structure prediction; support vector machine;
D O I
10.1002/prot.21422
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A number of methods for predicting levels of solvent accessibility or accessible surface area (ASA) of amino acid residues in proteins have been developed. These methods either predict regularly spaced states of relative solvent accessibility or an analogue real value indicating relative solvent accessibility. While discrete states of exposure can be easily obtained by post prediction assignment of thresholds to the predicted or computed real values of ASA, the reverse, that is, obtaining a real value from quantized states of predicted ASA, is not straightforward as a two-state prediction in such cases would give a large real valued errors. However, prediction of ASA into larger number of ASA states and then finding a corresponding scheme for real value prediction may be helpful in integrating the two approaches of ASA prediction. We report a novel method of obtaining numerical real values of solvent accessibility, using accumulation cutoff set and support vector machine. This so-called SVM-Cabins method first predicts discrete states of ASA of amino acid residues from their evolutionary profile and then maps the predicted states onto a real valued linear space by simple algebraic methods. Resulting performance of such a rigorous approach using 13-state ASA prediction is at least comparable with the best methods of ASA prediction reported so far. The mean absolute error in this method reaches the best performance of 15.1% on the tested data set of 502 proteins with a coefficient of correlation equal to 0.66. Since, the method starts with the prediction of discrete states of ASA and leads to real value predictions, performance of prediction in binary states and real values are simultaneously optimized.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 25 条
[1]   Accurate prediction of solvent accessibility using neural networks-based regression [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]   Real value prediction of solvent accessibility from amino acid sequence [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 50 (04) :629-635
[4]   NETASA: neural network based prediction of solvent accessibility [J].
Ahmad, S ;
Gromiha, MM .
BIOINFORMATICS, 2002, 18 (06) :819-824
[5]  
[Anonymous], 1990, NEUROCOMPUTING, DOI [DOI 10.1007/978-3-642-76153-9_5, 10.1007/978-3-642-76153-9_5]
[6]  
[Anonymous], P 5 ANN WORKSH COMP
[7]  
[Anonymous], 2006, LIBSVM LIB SUPPORT V
[8]  
Betts M. J., 2003, BIOINFORMATICS GENET
[9]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[10]  
Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO