The influence of reduced amino acid alphabets on prediction orthologous protein thermostability

被引:0
作者
Jiang, Yuxin [1 ]
Yuan, Xiaoyu [1 ]
Zheng, Shizhe [1 ]
Luo, Silin [1 ]
Chen, Haidong [1 ]
Ding, Yanrui [1 ]
机构
[1] Jiangnan Univ, Sch Sci, Wuxi 214122, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Reduced amino acid alphabet; Thermostability; Sequence feature; Fuzzy clustering; Orthologous protein; DISCRIMINATION; STABILITY;
D O I
10.1007/s11756-025-01935-2
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sequence features play a vital role in determining protein thermostability. Considering the advantages of reduced amino acid alphabets (RAAs) in reducing data complexity and retaining key sequence information, we evaluate the performance of 672 RAAs on prediction orthologous protein thermostability. We calculate the Amino Acid Composition, Dipeptide Composition, and Tripeptide Composition of the reduced sequence features, and use the random forest model to make predictions. The results show that 10 RAAs, selected using fuzzy clustering, are effective in predicting thermostability differences between orthologous protein pairs, significantly improving prediction efficiency. Further, the melting temperature difference Delta Tm caused by point mutations is predicted, and it is found that the RAA of EQ-H-K-DN-IL-P-T-FY-M-R-S-W-A-C-G-V could fit the tiny thermostability change caused by point mutations. Our work showcases that the reduction methods based on fuzzy clustering can effectively retain the key sequence features that affect protein thermostability, resulting in reducing the computational complexity and increasing the prediction accuracy.
引用
收藏
页码:1823 / 1833
页数:11
相关论文
共 39 条
[1]   A Statistical Analysis of the Sequence and Structure of Thermophilic and Non-Thermophilic Proteins [J].
Ahmed, Zahoor ;
Zulfiqar, Hasan ;
Tang, Lixia ;
Lin, Hao .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (17)
[2]   Discrimination of Thermophilic and Mesophilic Proteins Using Reduced Amino Acid Alphabets with n-Grams [J].
Albayrak, Aydin ;
Sezerman, Ugur O. .
CURRENT BIOINFORMATICS, 2012, 7 (02) :152-158
[3]  
Anishetty Sharmila, 2002, BMC Struct Biol, V2, P9, DOI 10.1186/1472-6807-2-9
[4]   Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices [J].
Cannata, N ;
Toppo, S ;
Romualdi, C ;
Valle, G .
BIOINFORMATICS, 2002, 18 (08) :1102-1108
[5]   iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Chen ;
Li, Fuyi ;
Xiang, Dongxu ;
Chen, Yong-Zi ;
Akutsu, Tatsuya ;
Daly, Roger J. ;
Webb, Geoffrey, I ;
Zhao, Quanzhi ;
Kurgan, Lukasz ;
Song, Jiangning .
NUCLEIC ACIDS RESEARCH, 2021, 49 (10)
[6]   The influence of dipeptide composition on protein thermostability [J].
Ding, YR ;
Cai, YJ ;
Zhang, GX ;
Xu, WB .
FEBS LETTERS, 2004, 569 (1-3) :284-288
[7]   Fuzzy k-Means: history and applications [J].
Ferraro, Maria Brigida .
ECONOMETRICS AND STATISTICS, 2024, 30 :110-123
[8]   Analysis of structural requirements for thermo-adaptation from orthologs in microbial genomes [J].
Gao, Junxiang ;
Wang, Wei .
ANNALS OF MICROBIOLOGY, 2012, 62 (04) :1635-1641
[9]   Hydrophobic environment is a key factor for the stability of thermophilic proteins [J].
Gromiha, M. Michael ;
Pathak, Manish C. ;
Saraboji, Kadhirvel ;
Ortlund, Eric A. ;
Gaucher, Eric A. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (04) :715-721
[10]  
Ibrahim NE., 2017, Curr Biochem Eng, V4, P75, DOI [10.2174/2212711904666170405123414, DOI 10.2174/2212711904666170405123414]