iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model

被引:72
作者
Akbar, Shahid [1 ]
Ahmad, Ashfaq [1 ]
Hayat, Maqsood [1 ]
Rehman, Ateeq Ur [2 ]
Khan, Salman [1 ]
Ali, Farman [3 ]
机构
[1] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, KP, Pakistan
[2] Univ Haripur, Dept Informat Technol, Kp, Pakistan
[3] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
关键词
Antitubercular peptides; Ensemble classification; Genetic algorithm; One-hot encoding; Composite physiochemical properties; k-fold cross-validation test; TUBERCULOSIS; CLASSIFIER; PROTEIN; IDENTIFICATION; SITES; DNA;
D O I
10.1016/j.compbiomed.2021.104778
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported similar to 10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
引用
收藏
页数:9
相关论文
共 66 条
[41]   Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier [J].
Lin, Jianying ;
Chen, Hui ;
Li, Shan ;
Liu, Yushuang ;
Li, Xuan ;
Yu, Bin .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 98 :35-47
[42]   iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach [J].
Liu, Bin ;
Li, Kai ;
Huang, De-Shuang ;
Chou, Kuo-Chen .
BIOINFORMATICS, 2018, 34 (22) :3835-3842
[43]   2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function [J].
Liu, Bin ;
Yang, Fan ;
Chou, Kuo-Chen .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2017, 7 :267-277
[44]  
Liu B, 2017, BIOINFORMATICS, V33, P35, DOI [10.1093/bioinformatics/btv604, 10.1093/bioinformatics/btw539]
[45]   AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees [J].
Manavalan, Balachandran ;
Basith, Shaherin ;
Shin, Tae Hwan ;
Wei, Leyi ;
Lee, Gwang .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2019, 17 :972-981
[46]  
Mirjalili S, 2019, STUD COMPUT INTELL, V780, P1, DOI 10.1007/978-3-319-93025-1
[47]   Enhanced Prediction and Characterization of CDK Inhibitors Using Optimal Class Distribution [J].
Nath, Abhigyan ;
Karthikeyan, S. .
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2017, 9 (02) :292-303
[48]   Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis [J].
Raimondi, Daniele ;
Orlando, Gabriele ;
Vranken, Wim F. ;
Moreau, Yves .
SCIENTIFIC REPORTS, 2019, 9 (1)
[49]   APOGA: An Adaptive Population Pool Size Based Genetic Algorithm [J].
Rajakumar, B. R. ;
George, Aloysius .
2013 AASRI CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL, 2013, 4 :288-296
[50]   UniProt: the universal protein knowledgebase (vol 45, pg D158, 2017) [J].
Renaux, Alexandre .
NUCLEIC ACIDS RESEARCH, 2018, 46 (05) :2699-2699