NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information

被引:3
作者
Jia, Li-Na [1 ]
Yan, Xin [2 ,3 ]
You, Zhu-Hong [4 ]
Zhou, Xi [4 ]
Li, Li-Ping [4 ]
Wang, Lei [1 ,4 ]
Song, Ke-Jian [5 ]
机构
[1] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang, Peoples R China
[2] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
[3] Zaozhuang Univ, Sch Foreign Languages, Zaozhuang, Peoples R China
[4] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
[5] Jiangxi Univ Sci & Technol, Sch Informat Engn, Ganzhou, Peoples R China
来源
EVOLUTIONARY BIOINFORMATICS | 2020年 / 16卷
基金
中国科学院西部之光基金; 中国国家自然科学基金;
关键词
Self-interacting protein; natural language processing; evolutionary information; stacked auto-encoder;
D O I
10.1177/1176934320984171
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
引用
收藏
页数:12
相关论文
共 36 条
  • [1] UniProt: a hub for protein information
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Apweiler, Rolf
    Alpi, Emanuele
    Antunes, Ricardo
    Arganiska, Joanna
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Gane, Paul
    Cas-tro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightin-gale, Andrew
    Orchard, Sandra
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    Zellner, Hermann
    Cowley, Andrew
    Figueira, Luis
    Li, Weizhong
    McWilliam, Hamish
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D204 - D212
  • [2] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [3] InnateDB: systems biology of innate immunity and beyond-recent updates and continuing curation
    Breuer, Karin
    Foroushani, Amir K.
    Laird, Matthew R.
    Chen, Carol
    Sribnaia, Anastasia
    Lo, Raymond
    Winsor, Geoffrey L.
    Hancock, Robert E. W.
    Brinkman, Fiona S. L.
    Lynn, David J.
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D1228 - D1233
  • [4] The BioGRID interaction database: 2017 update
    Chatr-aryamontri, Andrew
    Oughtred, Rose
    Boucher, Lorrie
    Rust, Jennifer
    Chang, Christie
    Kolas, Nadine K.
    O'Donnell, Lara
    Oster, Sara
    Theesfeld, Chandra
    Sellam, Adnane
    Stark, Chris
    Breitkreutz, Bobby-Joe
    Dolinski, Kara
    Tyers, Mike
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D369 - D379
  • [5] An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation
    Chen, Zhan-Heng
    Li, Li-Ping
    He, Zhou
    Zhou, Ji-Ren
    Li, Yangming
    Wong, Leon
    [J]. FRONTIERS IN GENETICS, 2019, 10
  • [6] STRING v9.1: protein-protein interaction networks, with increased coverage and integration
    Franceschini, Andrea
    Szklarczyk, Damian
    Frankild, Sune
    Kuhn, Michael
    Simonovic, Milan
    Roth, Alexander
    Lin, Jianyi
    Minguez, Pablo
    Bork, Peer
    von Mering, Christian
    Jensen, Lars J.
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D808 - D815
  • [7] PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS
    GRIBSKOV, M
    MCLACHLAN, AD
    EISENBERG, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) : 4355 - 4358
  • [8] Caught in self-interaction: evolutionary and functional mechanisms of protein homooligomerization
    Hashimoto, Kosuke
    Nishi, Hafumi
    Bryant, Stephen
    Panchenko, Anna R.
    [J]. PHYSICAL BIOLOGY, 2011, 8 (03)
  • [9] Extreme learning machines: a survey
    Huang, Guang-Bin
    Wang, Dian Hui
    Lan, Yuan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2011, 2 (02) : 107 - 122
  • [10] Binding properties and evolution of homodimers in protein-protein interaction networks
    Ispolatov, I
    Yuryev, A
    Mazo, I
    Maslov, S
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (11) : 3629 - 3635