Protein binding hot spots prediction from sequence only by a new ensemble learning method

被引:33
作者
Hu, Shan-Shan [1 ,2 ]
Chen, Peng [1 ,2 ,4 ,5 ]
Wang, Bing [3 ]
Li, Jinyan [4 ,5 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
[2] Anhui Univ, Inst Hlth Sci, Hefei 230601, Anhui, Peoples R China
[3] Anhui Univ Technol, Sch Elect & Informat Engn, Maanshan 243032, Anhui, Peoples R China
[4] Univ Technol, Adv Analyt Inst, Broadway, NSW 2007, Australia
[5] Univ Technol, Ctr Hlth Technol, Broadway, NSW 2007, Australia
基金
中国国家自然科学基金;
关键词
Hot spot residue; Ensemble system; IBk; AMINO-ACIDS; INTERFACES; DATABASE; IDENTIFICATION; ENERGY; SERVER; ACCESSIBILITY; RECOGNITION; INFORMATION; STABILITY;
D O I
10.1007/s00726-017-2474-6
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set.Availability: http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm.
引用
收藏
页码:1773 / 1785
页数:13
相关论文
共 45 条
[31]   ISIS: interaction sites identified from sequence [J].
Ofran, Yanay ;
Rost, Burkhard .
BIOINFORMATICS, 2007, 23 (02) :E13-E16
[32]   PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition [J].
Shen, Hong-Bin ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2008, 373 (02) :386-388
[33]   ECMIS: computational approach for the identification of hotspots at protein-protein interfaces [J].
Shingate, Prashant ;
Manoharan, Malini ;
Sukhwal, Anshul ;
Sowdhamini, Ramanathan .
BMC BIOINFORMATICS, 2014, 15
[34]   HELIX COIL STABILITY-CONSTANTS FOR THE NATURALLY-OCCURRING AMINO-ACIDS IN WATER .22. HISTIDINE PARAMETERS FROM RANDOM POLY[(HYDROXYBUTYL)GLUTAMINE-CO-L-HISTIDINE] [J].
SUEKI, M ;
LEE, S ;
POWERS, SP ;
DENTON, JB ;
KONISHI, Y ;
SCHERAGA, HA .
MACROMOLECULES, 1984, 17 (02) :148-155
[35]   ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions [J].
Thorn, KS ;
Bogan, AA .
BIOINFORMATICS, 2001, 17 (03) :284-285
[36]   HotPoint: hot spot prediction server for protein interfaces [J].
Tuncbag, Nurcan ;
Keskin, Ozlem ;
Gursoy, Attila .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W402-W406
[37]   Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy [J].
Tuncbag, Nurcan ;
Gursoy, Attila ;
Keskin, Ozlem .
BIOINFORMATICS, 2009, 25 (12) :1513-1520
[38]   Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues [J].
Wang, Lin ;
Zhang, Wenjuan ;
Gao, Qiang ;
Xiong, Congcong .
IET SYSTEMS BIOLOGY, 2014, 8 (04) :184-190
[39]   Prediction of hot spots in protein interfaces using a random forest model with hybrid features [J].
Wang, Lin ;
Liu, Zhi-Ping ;
Zhang, Xiang-Sun ;
Chen, Luonan .
PROTEIN ENGINEERING DESIGN & SELECTION, 2012, 25 (03) :119-126
[40]  
WELLS JA, 1991, METHOD ENZYMOL, V202, P390