Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

被引:14
作者
Jiang, Jinjian [1 ,2 ]
Wang, Nian [1 ]
Chen, Peng [3 ]
Zheng, Chunhou [4 ]
Wang, Bing [5 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Anqing Normal Univ, Sch Comp & Informat, Anqing 246133, Peoples R China
[3] Anhui Univ, Inst Hlth Sci, Hefei 230601, Anhui, Peoples R China
[4] Anhui Univ, Sch Elect Engn & Automat, Hefei 230601, Anhui, Peoples R China
[5] Anhui Univ Technol, Sch Elect & Informat Engn, Maanshan 243032, Peoples R China
基金
中国国家自然科学基金;
关键词
random projection; hot spots; IBk; ensemble system; COMPUTATIONAL HOT-SPOTS; BINDING-ENERGY; INTERFACES; RESIDUES; DATABASE; MUTATIONS; COMPLEXES; ACCURACY; FEATURES; SERVER;
D O I
10.3390/ijms18071543
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.
引用
收藏
页数:13
相关论文
共 42 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Anatomy of hot spots in protein interfaces [J].
Bogan, AA ;
Thorn, KS .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 280 (01) :1-9
[3]   Analysis of homodimeric protein interfaces by graph-spectral methods [J].
Brinda, KV ;
Kannan, N ;
Vishveshwara, S .
PROTEIN ENGINEERING, 2002, 15 (04) :265-277
[4]   A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction [J].
Chen, Peng ;
Hu, ShanShan ;
Zhang, Jun ;
Gao, Xin ;
Li, Jinyan ;
Xia, Junfeng ;
Wang, Bing .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :901-912
[5]   LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone [J].
Chen, Peng ;
Huang, Jianhua Z. ;
Gao, Xin .
BMC BIOINFORMATICS, 2014, 15
[6]   Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences [J].
Chen, Peng ;
Li, Jinyan ;
Wong, Limsoon ;
Kuwahara, Hiroyuki ;
Huang, Jianhua Z. ;
Gao, Xin .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (08) :1351-1362
[7]   Detection of Outlier Residues for Improving Interface Prediction in Protein Heterocomplexes [J].
Chen, Peng ;
Wong, Limsoon ;
Li, Jinyan .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (04) :1155-1165
[8]   Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information [J].
Chen, Peng ;
Li, Jinyan .
BMC BIOINFORMATICS, 2010, 11
[9]   A feature-based approach to modeling proteinprotein interaction hot spots [J].
Cho, Kyu-il ;
Kim, Dongsup ;
Lee, Doheon .
NUCLEIC ACIDS RESEARCH, 2009, 37 (08) :2672-2687
[10]   A HOT-SPOT OF BINDING-ENERGY IN A HORMONE-RECEPTOR INTERFACE [J].
CLACKSON, T ;
WELLS, JA .
SCIENCE, 1995, 267 (5196) :383-386