Tree based models for classification of membrane and secreted proteins in heart

被引:0
作者
Sona Charles [1 ]
A. Subeesh [2 ]
Jeyakumar Natarajan [3 ]
机构
[1] Bharathiar University,Data Mining and Text Mining Laboratory, Department of Bioinformatics
[2] ICAR-Indian Institute of Spices Research,Division of Crop Improvement and Biotechnology
[3] ICAR- Central Institute of Agricultural Engineering,Agricultural Mechanization Division
关键词
Machine learning; Membrane proteins; Mutual information; Secretory proteins; Tree-based algorithms;
D O I
10.1007/s42485-024-00131-1
中图分类号
学科分类号
摘要
Computational differentiation of membrane and secreted proteins is one of the challenging and interesting topics in bioinformatics. It is a laborious as well as time-consuming task to experimentally differentiate between membrane and secreted proteins. In this study, we used tree-based classifiers such as decision trees, random forest, light gradient boosting machine, gradient boosting decision tree and extreme gradient boosting trees using sequence-based descriptors, viz. amino acid composition, dipeptide composition, conjoint triads, composition/transition/distribution and pseudo amino acid composition in the prediction scheme to enhance the predictive power of algorithms. RF on CTD was able to better discriminate the classes in the classification problems secreted versus non-secreted and secreted versus membrane proteins while in membrane versus non-membrane. Feature selection using mutual information considerably increased the prediction accuracy of the models. Multiclass models to distinguish membrane protein and secreted proteins from other proteins in the heart were enhanced by the addition of protein interaction network-based features, with highest accuracy being displayed by XgBoost. It is expected that the models developed using tree-based algorithms will be useful for classification and annotation proteins with known sequences.
引用
收藏
页码:147 / 157
页数:10
相关论文
共 50 条
[21]   Tree Species Classification Based on Point Cloud Completion [J].
Liu, Haoran ;
Zhong, Hao ;
Xie, Guangqiang ;
Zhang, Ping .
FORESTS, 2025, 16 (02)
[22]   HEART DISEASE CLASSIFICATION BASED ON FEATURE FUSION [J].
Zhao, Ting-Ting ;
Yuan, Yu-Bo ;
Wang, Ying-Jie ;
Gao, Ju ;
He, Ping .
PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2017, :111-117
[23]   Inductive learning of tree-based regression models [J].
Torgo, L .
AI COMMUNICATIONS, 2000, 13 (02) :137-138
[24]   Japanese Fingerspelling Recognition based on Classification Tree and Machine Learning [J].
Mukai, Nobuhiko ;
Harada, Naoto ;
Chang, Youngha .
2017 NICOGRAPH INTERNATIONAL (NICOINT), 2017, :19-24
[25]   An Improved Binary Tree SVM Classification Algorithm based on Bayesian [J].
Ren, LiBin ;
Chang, HuiYou ;
Yi, Yang .
2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, :178-181
[26]   An improved tree model based on ensemble feature selection for classification [J].
Mohan, Chandralekha ;
Nagarajan, Shenbagavadivu .
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) :1290-1307
[27]   Kernel-based mixture models for classification [J].
Murua, Alejandro ;
Wicker, Nicolas .
COMPUTATIONAL STATISTICS, 2015, 30 (02) :317-344
[28]   Kernel-based mixture models for classification [J].
Alejandro Murua ;
Nicolas Wicker .
Computational Statistics, 2015, 30 :317-344
[29]   Metric for validation of predictive classification models to predict the risk of heart attack [J].
Belloni, Marcio ;
de Carvalho, Jose Antonio Dias .
REVISTA ENIAC PESQUISA, 2025, 14 (01) :28-42
[30]   The salivary cortisol classification based on the heart rate variability [J].
Simorgh, Leila ;
Pirzad Jahromi, Gila ;
Salari, Sousan ;
Hatef, Boshra .
HORMONE MOLECULAR BIOLOGY AND CLINICAL INVESTIGATION, 2025,