PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer

被引:0
作者
Qin, Xinyi [1 ]
Zhang, Lu [2 ]
Liu, Min [1 ]
Liu, Guangzhong [1 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, Shanghai 201306, Peoples R China
[2] Jiangnan Univ, Sch Internet Things Engn, Wuxi 214063, Jiangsu, Peoples R China
关键词
Feature extraction; Hidden Markov models; Amino acids; Protein sequence; Predictive models; Prediction algorithms; Vectors; ASTRAL; ensemble feature selection method; PageRank algorithm; protein fold recognition; transformer; SCORING MATRIX; CLASSIFICATION; PREDICTION; PROBABILITIES; NMR;
D O I
10.1109/TCBB.2024.3414497
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Understanding the tertiary structures of proteins is of great benefit to function in many aspects of human life. Protein fold recognition is a vital and salient means to know protein structure. Until now, researchers have successively proposed a variety of methods to realize protein fold recognition, but the novel and effective computational method is still needed to handle this problem with the continuous updating of protein structure databases. In this study, we develop a new protein structure dataset named AT and propose the PRFold-TNN model for protein fold recognition. First, different types of feature extraction methods including AAC, HMM, HMM-Bigram and ACC are selected to extract corresponding features for protein sequences. Then an ensemble feature selection method based on PageRank algorithm integrating various tree-based algorithms is used to screen the fusion features. Ultimately, the classifier based on the Transformer model achieves the final prediction. Experiments show that the prediction accuracy is 86.27% on the AT dataset and 88.91% on the independent test set, indicating that the model can demonstrate superior performance and generalization ability in the problem of protein fold recognition. Furthermore, we also carry out research on the DD, EDD and TG benchmark datasets, and make them achieve prediction accuracy of 88.41%, 97.91% and 95.16%, which are at least 3.0%, 0.8% and 2.5% higher than those of the state-of-the-art methods. It can be concluded that the PRFold-TNN model is more prominent.
引用
收藏
页码:1740 / 1751
页数:12
相关论文
共 24 条
  • [21] A feature selection model for speech emotion recognition using clustering-based population generation with hybrid of equilibrium optimizer and atom search optimization algorithm
    Chattopadhyay, Soham
    Dey, Arijit
    Singh, Pawan Kumar
    Ahmadian, Ali
    Sarkar, Ram
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 9693 - 9726
  • [22] Breast cancer detection from thermal images using a Grunwald-Letnikov-aided Dragonfly algorithm-based deep feature selection method
    Chatterjee, Somnath
    Biswas, Shreya
    Majee, Arindam
    Sen, Shibaprasad
    Oliva, Diego
    Sarkar, Ram
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141
  • [23] PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection
    Ullah, Matee
    Han, Ke
    Hadi, Fazal
    Xu, Jian
    Song, Jiangning
    Yu, Dong-Jun
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [24] Mineral content estimation for salt lakes on the Tibetan plateau based on the genetic algorithm-based feature selection method using Sentinel-2 imagery: A case study of the Bieruoze Co and Guopu Co lakes
    Guo, Hengliang
    Dai, Wenhao
    Zhang, Rongrong
    Zhang, Dujuan
    Qiao, Baojin
    Zhang, Gubin
    Zhao, Shan
    Shang, Jiandong
    FRONTIERS IN EARTH SCIENCE, 2023, 11