Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods

被引:6
作者
Shirafkan, Farshid [1 ]
Gharaghani, Sajjad [1 ]
Rahimian, Karim [2 ]
Sajedi, Reza Hasan [3 ]
Zahiri, Javad [4 ,5 ]
机构
[1] Univ Tehran, Inst Biochem & Biophys, Lab Bioinformat & Drug Design, Tehran, Iran
[2] Tarbiat Modares Univ, Fac Biol Sci, Dept Biophys, Bioinformat & Computat Omics Lab BioCOOL, Tehran, Iran
[3] Tarbiat Modares Univ, Fac Biol Sci, Dept Biochem, Tehran, Iran
[4] Univ Calif San Diego, Dept Neurosci, La Jolla, CA 92093 USA
[5] Univ Calif San Diego, Dept Pediat, La Jolla, CA 92093 USA
关键词
Moonlighting protein; Multitasking proteins; Physico-chemical properties; PSSM; Outlier; Random forest; SVM; bioinformatics; SEQUENCE; UPDATE;
D O I
10.1186/s12859-021-04194-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein's impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting.
引用
收藏
页数:14
相关论文
共 35 条
[1]   rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest [J].
Akbaripour-Elahabad, Mohammad ;
Zahiri, Javad ;
Rafeh, Reza ;
Eslami, Morteza ;
Azari, Mahboobeh .
JOURNAL OF THEORETICAL BIOLOGY, 2016, 402 :1-8
[2]  
Amerifar S, 2020, FTRCOOL FEATURE EXTR
[3]  
[Anonymous], 2014, International Journal of Computer Applications, DOI [DOI 10.5120/17026-7318, 10.5120/17026-7318]
[4]  
Bramer M, 2013, MEASURING PERFORMANC, P175
[5]   Extreme multifunctional proteins identified from a human protein interaction network [J].
Chapple, Charles E. ;
Robisson, Benoit ;
Spinelli, Lionel ;
Guien, Celine ;
Becker, Emmanuelle ;
Brun, Christine .
NATURE COMMUNICATIONS, 2015, 6
[6]   MoonProt 2.0: an expansion and update of the moonlighting proteins database [J].
Chen, Chang ;
Zabad, Shadi ;
Liu, Haipeng ;
Wang, Fei ;
Jeffery, Constance .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D640-D644
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[9]  
Das S., 2017, MOONLIGHTING PROTEIN, P53
[10]   Multifunctional Proteins: Involvement in Human Diseases and Targets of Current Drugs [J].
Franco-Serrano, Luis ;
Huerta, Mario ;
Hernandez, Sergio ;
Cedano, Juan ;
Perez-Pons, JosepAntoni ;
Pinol, Jaume ;
Mozo-Villarias, Angel ;
Amela, Isaac ;
Querol, Enrique .
PROTEIN JOURNAL, 2018, 37 (05) :444-453