共 24 条
PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer
被引:0
作者:
Qin, Xinyi
[1
]
Zhang, Lu
[2
]
Liu, Min
[1
]
Liu, Guangzhong
[1
]
机构:
[1] Shanghai Maritime Univ, Coll Informat Engn, Shanghai 201306, Peoples R China
[2] Jiangnan Univ, Sch Internet Things Engn, Wuxi 214063, Jiangsu, Peoples R China
关键词:
Feature extraction;
Hidden Markov models;
Amino acids;
Protein sequence;
Predictive models;
Prediction algorithms;
Vectors;
ASTRAL;
ensemble feature selection method;
PageRank algorithm;
protein fold recognition;
transformer;
SCORING MATRIX;
CLASSIFICATION;
PREDICTION;
PROBABILITIES;
NMR;
D O I:
10.1109/TCBB.2024.3414497
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
Understanding the tertiary structures of proteins is of great benefit to function in many aspects of human life. Protein fold recognition is a vital and salient means to know protein structure. Until now, researchers have successively proposed a variety of methods to realize protein fold recognition, but the novel and effective computational method is still needed to handle this problem with the continuous updating of protein structure databases. In this study, we develop a new protein structure dataset named AT and propose the PRFold-TNN model for protein fold recognition. First, different types of feature extraction methods including AAC, HMM, HMM-Bigram and ACC are selected to extract corresponding features for protein sequences. Then an ensemble feature selection method based on PageRank algorithm integrating various tree-based algorithms is used to screen the fusion features. Ultimately, the classifier based on the Transformer model achieves the final prediction. Experiments show that the prediction accuracy is 86.27% on the AT dataset and 88.91% on the independent test set, indicating that the model can demonstrate superior performance and generalization ability in the problem of protein fold recognition. Furthermore, we also carry out research on the DD, EDD and TG benchmark datasets, and make them achieve prediction accuracy of 88.41%, 97.91% and 95.16%, which are at least 3.0%, 0.8% and 2.5% higher than those of the state-of-the-art methods. It can be concluded that the PRFold-TNN model is more prominent.
引用
收藏
页码:1740 / 1751
页数:12
相关论文
共 24 条