Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection

被引:268
作者
Liu, Bin [1 ,2 ,3 ,4 ]
Zhang, Deyuan [5 ]
Xu, Ruifeng [1 ,2 ]
Xu, Jinghao [1 ]
Wang, Xiaolong [1 ,2 ]
Chen, Qingcai [1 ,2 ]
Dong, Qiwen [6 ]
Chou, Kuo-Chen [4 ,7 ]
机构
[1] Shenzhen Grad Sch, Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[2] Shenzhen Grad Sch, Harbin Inst Technol, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R China
[3] Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[4] Gordon Life Sci Inst, Belmont, MA 02478 USA
[5] Shenyang Aerosp Univ, Sch Comp, Shenyang, Liaoning, Peoples R China
[6] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[7] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia
基金
中国国家自然科学基金;
关键词
AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; CYCLIN-DEPENDENT KINASE-5; LATENT SEMANTIC ANALYSIS; ENZYME SUBFAMILY CLASSES; NEURONAL CDK5 ACTIVATOR; FOLD RECOGNITION; PREDICTION; ALIGNMENT; DOMAINS;
D O I
10.1093/bioinformatics/btt709
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Owing to its importance in both basic research (such as molecular evolution and protein attribute prediction) and practical application (such as timely modeling the 3D structures of proteins targeted for drug development), protein remote homology detection has attracted a great deal of interest. It is intriguing to note that the profile-based approach is promising and holds high potential in this regard. To further improve protein remote homology detection, a key step is how to find an optimal means to extract the evolutionary information into the profiles. Results: Here, we propose a novel approach, the so-called profile-based protein representation, to extract the evolutionary information via the frequency profiles. The latter can be calculated from the multiple sequence alignments generated by PSI-BLAST. Three top performing sequence-based kernels (SVM-Ngram, SVM-pairwise and SVM-LA) were combined with the profile-based protein representation. Various tests were conducted on a SCOP benchmark dataset that contains 54 families and 23 superfamilies. The results showed that the new approach is promising, and can obviously improve the performance of the three kernels. Furthermore, our approach can also provide useful insights for studying the features of proteins in various families. It has not escaped our notice that the current approach can be easily combined with the existing sequence-based methods so as to improve their performance as well.
引用
收藏
页码:472 / 479
页数:8
相关论文
共 68 条
[31]   Combining pairwise-sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships [J].
Liao, L ;
Noble, WS .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) :857-868
[32]  
Lin S.X., 2013, J BIOMED SCI JBISE, V6, P435, DOI [10.4236/jbise.2013.64054, DOI 10.4236/JBISE.2013.64054]
[33]   Remote homology detection based on oligomer distances [J].
Lingner, Thomas ;
Meinicke, Peter .
BIOINFORMATICS, 2006, 22 (18) :2224-2231
[34]   Word correlation matrices for protein sequence analysis and remote homology detection [J].
Lingner, Thomas ;
Meinicke, Peter .
BMC BIOINFORMATICS, 2008, 9 (1)
[35]   Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation [J].
Liu, Bin ;
Wang, Xiaolong ;
Zou, Quan ;
Dong, Qiwen ;
Chen, Qingcai .
MOLECULAR INFORMATICS, 2013, 32 (9-10) :775-782
[36]   Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection [J].
Liu, Bin ;
Wang, Xiaolong ;
Chen, Qingcai ;
Dong, Qiwen ;
Lan, Xun .
PLOS ONE, 2012, 7 (09)
[37]   Prediction of protein binding sites in protein structures using hidden Markov support vector machine [J].
Liu, Bin ;
Wang, Xiaolong ;
Lin, Lei ;
Tang, Buzhou ;
Dong, Qiwen ;
Wang, Xuan .
BMC BIOINFORMATICS, 2009, 10
[38]   A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis [J].
Liu, Bin ;
Wang, Xiaolong ;
Lin, Lei ;
Dong, Qiwen ;
Wang, Xuan .
BMC BIOINFORMATICS, 2008, 9 (1)
[39]   Protein remote homology detection based on auto-cross covariance transformation [J].
Liu, Xuan ;
Zhao, Lijie ;
Dong, Qiwen .
COMPUTERS IN BIOLOGY AND MEDICINE, 2011, 41 (08) :640-647
[40]   Protein function annotation by homology-based inference [J].
Loewenstein, Yaniv ;
Raimondo, Domenico ;
Redfern, Oliver C. ;
Watson, James ;
Frishman, Dmitrij ;
Linial, Michal ;
Orengo, Christine ;
Thornton, Janet ;
Tramontano, Anna .
GENOME BIOLOGY, 2009, 10 (02) :207