Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier

被引:37
|
作者
Wang, Lei [1 ,2 ]
You, Zhu-Hong [3 ]
Xia, Shi-Xiong [1 ]
Liu, Feng [4 ]
Chen, Xing [5 ]
Yan, Xin [6 ]
Zhou, Yong [1 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang 277100, Shandong, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
[4] China Natl Coal Assoc, Beijing 100713, Peoples R China
[5] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
[6] Zaozhuang Univ, Sch Foreign Languages, Zaozhuang 277100, Shandong, Peoples R China
基金
美国国家科学基金会;
关键词
Position-specific scoring matrix; Multiple sequences alignments; Rotation forest; Cancer; SEQUENCE-BASED PREDICTION; ROTATION FOREST; PSI-BLAST; TOOL; HYPERPLANES; GENERATION; DATABASE;
D O I
10.1016/j.jtbi.2017.01.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein-Protein Interactions (PPIs) are essential to most biological processes and play a critical role in most cellular functions. With the development of high-throughput biological techniques and in si/ico methods, a large number of PPI data have been generated for various organisms, but many problems remain unsolved. These factors promoted the development of the in silico methods based on machine learning to predict PPIs. In this study, we propose a novel method by combining ensemble Rotation Forest (RF) classifier and Discrete Cosine Transform (DCT) algorithm to predict the interactions among proteins. Specifically, the protein amino acids sequence is transformed into Position-Specific Scoring Matrix (PSSM) containing biological evolution information, and then the feature vector is extracted to present protein evolutionary information using DCT algorithm; finally, the ensemble rotation forest model is used to predict whether a given protein pair is interacting or not. When performed on Yeast and H. pylori data sets, the proposed method achieved excellent results with an average accuracy of 98.54% and 88.27%. In addition, we achieved good prediction accuracy of 98.08%, 92.75%, 98.87% and 98.72% on independent data sets (C.elegans, E.coli, Hsapiens and M.muscu/us). In order to further evaluate the performance of our method, we compare it with the state-of-the-art Support Vector Machine (SVM) classifier and get good results.
引用
收藏
页码:105 / 110
页数:6
相关论文
共 31 条
  • [21] PPIevo: Protein-protein interaction prediction from PSSM based evolutionary information
    Zahiri, Javad
    Yaghoubi, Omid
    Mohammad-Noori, Morteza
    Ebrahimpour, Reza
    Masoudi-Nejad, Ali
    GENOMICS, 2013, 102 (04) : 237 - 242
  • [22] Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination
    Peiying Tao
    Taigang Liu
    Xiaowei Li
    Lanming Chen
    Amino Acids, 2015, 47 : 461 - 468
  • [23] Sequence-Based Prediction of Protein-Protein Interactions Using Pseudo Substitution Matrix Representation Features and Ensemble Rotation Forest Classifier in HIV (Human Immunodeficiency Virus)
    Lestari, D.
    Hartomo, S.
    Bustamam, A.
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2017 (ISCPMS2017), 2018, 2023
  • [24] An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers
    You, Zhu-Hong
    Li, Xiao
    Chan, Keith C. C.
    NEUROCOMPUTING, 2017, 228 : 277 - 282
  • [25] Using Correlation Analysis and Nonnegative Matrix Factorization to Predict Protein Structural Classes via Position-Specific Scoring Matrix
    Liang, Yunyun
    Liu, Sanyang
    Zhang, Shengli
    MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2016, 75 (03) : 743 - 758
  • [26] Sequence-Based Prediction of Protein-Protein Interactions Using Ensemble Based Classifier Combined with Global Encoding in HIV (Human Immunodeficiency Virus)
    Lestari, D.
    Musti, M. I. S.
    Bustamam, A.
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2017 (ISCPMS2017), 2018, 2023
  • [27] Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information
    Gao, Jianzhao
    Zhang, Ning
    Ruan, Jishou
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2013, 47 : 215 - 220
  • [28] Prediction of Protein-Protein Interactions from Sequences using a Correlation Matrix of the Physicochemical Properties of Amino Acids
    Kopoin, Charlemagne N'Diffon
    Atiampo, Armand Kodjo
    N'Guessan, Behou Gerard
    Babri, Michel
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (03): : 41 - 47
  • [29] Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis
    You, Zhu-Hong
    Lei, Ying-Ke
    Zhu, Lin
    Xia, Junfeng
    Wang, Bing
    BMC BIOINFORMATICS, 2013, 14
  • [30] Using Weighted Extreme Learning Machine Combined with Scale-Invariant Feature Transform to Predict Protein-Protein Interactions from Protein Evolutionary Information
    Li, Jianqiang
    Shi, Xiaofeng
    You, Zhuhong
    Chen, Zhuangzhuang
    Lin, Qiuzhen
    Fang, Min
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT I, 2018, 10954 : 527 - 532