FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network

被引:48
|
作者
Shao, Jiangyi [1 ]
Yan, Ke [2 ]
Liu, Bin [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
protein fold recognition; seq-to-seq model; seq-to-cluster model; cluster-to-cluster model; HIDDEN MARKOV-MODELS; HOMOLOGY DETECTION; CLASSIFICATION; INFORMATION; FEATURES;
D O I
10.1093/bib/bbaa144
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As a key for studying the protein structures, protein fold recognition is playing an important role in predicting the protein structures associated with COVID-19 and other important structures. However, the existing computational predictors only focus on the protein pairwise similarity or the similarity between two groups of proteins from 2-folds. However, the homology relationship among proteins is in a hierarchical structure. The global protein similarity network will contribute to the performance improvement. In this study, we proposed a predictor called FoldRec-C2C to globally incorporate the interactions among proteins into the prediction. For the FoldRec-C2C predictor, protein fold recognition problem is treated as an information retrieval task in nature language processing. The initial ranking results were generated by a surprised ranking algorithm Learning to Rank, and then three re-ranking algorithms were performed on the ranking lists to adjust the results globally based on the protein similarity network, including seq-to-seq model, seq-to-cluster model and cluster-to-cluster model (C2C). When tested on a widely used and rigorous benchmark dataset LINDAHL dataset, FoldRec-C2C outperforms other 34 state-of-the-art methods in this field. The source code and data of FoldRec-C2C can be downloaded from http://bliulab.net/FoldRec-C2C/download.
引用
收藏
页数:11
相关论文
共 5 条
  • [1] Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores
    Yan, Ke
    Wen, Jie
    Liu, Jin-Xing
    Xu, Yong
    Liu, Bin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (05) : 2008 - 2016
  • [2] DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks
    Liu, Bin
    Li, Chen-Chen
    Yan, Ke
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (05) : 1733 - 1741
  • [3] ResCNNT-fold: Combining residual convolutional neural network and Transformer for protein fold recognition from language model embeddings
    Qin, Xinyi
    Liu, Min
    Liu, Guangzhong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 166
  • [4] An Enhanced Protein Fold Recognition for Low Similarity Datasets Using Convolutional and Skip-Gram Features With Deep Neural Network
    Bankapur, Sanjay
    Patil, Nagamma
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2021, 20 (01) : 42 - 49
  • [5] Protein fold recognition with a two-layer method based on SVM-SA, WP-NN and C4.5 (TLM-SNC)
    Zangooei, Mohammad Hossein
    Jalili, Saeed
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2013, 8 (02) : 203 - 223