Predicting protein structural class by SVM with class-wise optimized features and decision probabilities

被引:53
作者
Anand, Ashish [1 ]
Pugalenthi, Ganesan [1 ]
Suganthan, P. N. [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
multi-class SVM; probability outputs SVM; SCOP class classification;
D O I
10.1016/j.jtbi.2008.02.031
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Determination of protein structural class solely from sequence information is a challenging task. Several attempts to solve this problem using various methods can be found in literature. We present support vector machine (SVM) approach where probability-based decision is used along with class-wise optimized feature sets. This approach has two distinguishing characteristics from earlier attempts: (1) it uses class-wise optimized features and (2) decisions of different SVM classifiers are coupled with probability estimates to make the final prediction. The algorithm was tested on three datasets, containing 498 domains, 1092 domains and 5261 domains. Ten-fold external cross-validation was performed to assess the performance of the algorithm. Significantly high accuracy of 92.89% was obtained for the 498-dataset. We achieved 54.67% accuracy for the dataset with 1092 domains, which is better than the previously reported best accuracy of 53.8%. We obtained 59.43% prediction accuracy for the larger and less redundant 5261-dataset. We also investigated the advantage of using class-wise features over union of these features (conventional approach) in one-vs.-all SVM framework. Our results clearly show the advantage of using class-wise optimized features. Brief analysis of the selected class-wise features indicates their biological significance. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:375 / 380
页数:6
相关论文
共 57 条
  • [1] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [2] [Anonymous], [No title captured]
  • [3] AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION
    AYER, M
    BRUNK, HD
    EWING, GM
    REID, WT
    SILVERMAN, E
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1955, 26 (04): : 641 - 647
  • [4] BOTTOU L, 1994, INT C PATT RECOG, P77, DOI 10.1109/ICPR.1994.576879
  • [5] Using LogitBoost classifier to predict protein structural classes
    Cai, YD
    Feng, KY
    Lu, WC
    Chou, KC
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (01) : 172 - 176
  • [6] Support vector machines for prediction of protein domain structural class
    Cai, YD
    Liu, XJ
    Xu, XB
    Chou, KC
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2003, 221 (01) : 115 - 120
  • [7] Support Vector Machines for predicting protein structural class
    Cai, Yu-Dong
    Liu, Xiao-Jun
    Xu, Xue-biao
    Zhou, Guo-Ping
    [J]. BMC BIOINFORMATICS, 2001, 2 (1)
  • [8] Prediction of protein structural class with Rough Sets
    Cao, YF
    Liu, S
    Zhang, LD
    Qin, J
    Wang, J
    Tang, KX
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [9] CHAI H, 2004, P 2 EUR WORKSH DAT M, P3
  • [10] The ASTRAL Compendium in 2004
    Chandonia, JM
    Hon, G
    Walker, NS
    Lo Conte, L
    Koehl, P
    Levitt, M
    Brenner, SE
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D189 - D192