Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier

被引:37
|
作者
Wang, Lei [1 ,2 ]
You, Zhu-Hong [3 ]
Xia, Shi-Xiong [1 ]
Liu, Feng [4 ]
Chen, Xing [5 ]
Yan, Xin [6 ]
Zhou, Yong [1 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang 277100, Shandong, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
[4] China Natl Coal Assoc, Beijing 100713, Peoples R China
[5] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
[6] Zaozhuang Univ, Sch Foreign Languages, Zaozhuang 277100, Shandong, Peoples R China
基金
美国国家科学基金会;
关键词
Position-specific scoring matrix; Multiple sequences alignments; Rotation forest; Cancer; SEQUENCE-BASED PREDICTION; ROTATION FOREST; PSI-BLAST; TOOL; HYPERPLANES; GENERATION; DATABASE;
D O I
10.1016/j.jtbi.2017.01.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein-Protein Interactions (PPIs) are essential to most biological processes and play a critical role in most cellular functions. With the development of high-throughput biological techniques and in si/ico methods, a large number of PPI data have been generated for various organisms, but many problems remain unsolved. These factors promoted the development of the in silico methods based on machine learning to predict PPIs. In this study, we propose a novel method by combining ensemble Rotation Forest (RF) classifier and Discrete Cosine Transform (DCT) algorithm to predict the interactions among proteins. Specifically, the protein amino acids sequence is transformed into Position-Specific Scoring Matrix (PSSM) containing biological evolution information, and then the feature vector is extracted to present protein evolutionary information using DCT algorithm; finally, the ensemble rotation forest model is used to predict whether a given protein pair is interacting or not. When performed on Yeast and H. pylori data sets, the proposed method achieved excellent results with an average accuracy of 98.54% and 88.27%. In addition, we achieved good prediction accuracy of 98.08%, 92.75%, 98.87% and 98.72% on independent data sets (C.elegans, E.coli, Hsapiens and M.muscu/us). In order to further evaluate the performance of our method, we compare it with the state-of-the-art Support Vector Machine (SVM) classifier and get good results.
引用
收藏
页码:105 / 110
页数:6
相关论文
共 31 条
  • [1] On Position-Specific Scoring Matrix for Protein Function Prediction
    Jeong, Jong Cheol
    Lin, Xiaotong
    Chen, Xue-Wen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (02) : 308 - 315
  • [2] RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human
    Liu, Xin
    Lu, Yaping
    Wang, Liang
    Geng, Wei
    Shi, Xinyi
    Zhang, Xiao
    BIG DATA MINING AND ANALYTICS, 2023, 6 (01) : 21 - 31
  • [3] An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model
    Li, Yang
    Li, Li-Ping
    Wang, Lei
    Yu, Chang-Qing
    Wang, Zheng
    You, Zhu-Hong
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (14)
  • [4] Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier
    Li, Zheng-Wei
    You, Zhu-Hong
    Chen, Xing
    Li, Li-Ping
    Huang, De-Shuang
    Yan, Gui-Ying
    Nie, Ru
    Huang, Yu-An
    ONCOTARGET, 2017, 8 (14) : 23638 - 23649
  • [5] Apoptosis Protein Subcellular Location Prediction Based on Position-Specific Scoring Matrix
    Yao, Yu-Hua
    Shi, Zhuo-Xing
    Dai, Qi
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2014, 11 (10) : 2073 - 2078
  • [6] An Ensemble Classifier with Random Projection for Predicting Protein-Protein Interactions Using Sequence and Evolutionary Information
    Song, Xiao-Yu
    Chen, Zhan-Heng
    Sun, Xiang-Yang
    You, Zhu-Hong
    Li, Li-Ping
    Zhao, Yang
    APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [7] Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model
    An, Ji-Yong
    Meng, Fan-Rong
    You, Zhu-Hong
    Chen, Xing
    Yan, Gui-Ying
    Hu, Ji-Pu
    PROTEIN SCIENCE, 2016, 25 (10) : 1825 - 1833
  • [8] Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier
    Chen, Cheng
    Zhang, Qingmei
    Yu, Bin
    Yu, Zhaomin
    Lawrence, Patrick J.
    Ma, Qin
    Zhang, Yan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 123
  • [9] CPIELA: Computational Prediction of Plant Protein-Protein Interactions by Ensemble Learning Approach From Protein Sequences and Evolutionary Information
    Li, Li-Ping
    Zhang, Bo
    Cheng, Li
    FRONTIERS IN GENETICS, 2022, 13
  • [10] Inferring homologous protein-protein interactions through pair position specific scoring matrix
    Lin, Chun-Yu
    Chen, Yung-Chiang
    Lo, Yu-Shu
    Yang, Jinn-Moon
    BMC BIOINFORMATICS, 2013, 14