An Overview on Protein Fold Classification via Machine Learning Approach

被引:5
作者
Tian, Xiaoyu [1 ]
Chen, Daozheng [1 ]
Gao, Jun [1 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, Shanghai 201306, Peoples R China
基金
中国国家自然科学基金;
关键词
Classification; dataset; ensemble classifier; feature extraction; machine learning; protein fold; SECONDARY STRUCTURE PREDICTION; SUPPORT VECTOR MACHINES; AMINO-ACID-COMPOSITION; ENSEMBLE CLASSIFIER; PHYSICOCHEMICAL PROPERTIES; HOMOLOGY DETECTION; ROTATION FOREST; SCORING MATRIX; GENERAL-FORM; WEB SERVER;
D O I
10.2174/1570164614666171030160312
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein fold classification plays a key role in protein functional analysis, molecular biology, cell biology, biomedicine and drug design. The methods of classifying protein fold can be roughly divided into two categories: taxonomy-based method and template-based method. Machine learning algorithms, due to their excellent performance, have been widely applied to taxonomy-based methods. In this review, we mainly discuss the most popular and representative taxonomy-based methods via machine learning approach, including the three important aspects: dataset, feature extraction method, and classifying algorithm. We compare the overall accuracies of methods using the same classifiers with different feature vectors and summarize the development tendency and potential research directions. This review intends to assist researchers in choosing appropriate materials and developing new classifying methods in this area.
引用
收藏
页码:85 / 98
页数:14
相关论文
共 93 条
[1]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   A two-layer classification framework for protein fold recognition [J].
Aram, Reza Zohouri ;
Charkari, Nasrollah Moghadam .
JOURNAL OF THEORETICAL BIOLOGY, 2015, 365 :32-39
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]  
Bouckaert RemcoR., 2004, BAYESIAN NETWORK CLA
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697
[8]   THE PROTEIN FOLDING PROBLEM [J].
CHAN, HS ;
DILL, KA .
PHYSICS TODAY, 1993, 46 (02) :24-32
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]   ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier [J].
Chen, Daozheng ;
Tian, Xiaoyu ;
Zhou, Bo ;
Gao, Jun .
BIOMED RESEARCH INTERNATIONAL, 2016, 2016