Fast and accurate protein intrinsic disorder prediction by using a pretrained language model

被引:8
|
作者
Song, Yidong [1 ]
Yuan, Qianmu [1 ]
Chen, Sheng [1 ]
Chen, Ken [1 ]
Zhou, Yaoqi [2 ,5 ]
Yang, Yuedong [1 ,3 ,4 ]
机构
[1] Sun Yat sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Shenzhen Bay Lab, Inst Syst & Phys Biol, Shenzhen, Peoples R China
[3] Sun Yat sen Univ, Sch Comp Sci & Engn, Guangzhou 510000, Peoples R China
[4] Sun Yat sen Univ, MOE, Key Lab Machine Intelligence & Adv Comp, Guangzhou 510000, Peoples R China
[5] Shenzhen Bay Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
sequence-based; protein intrinsic disorder prediction; pretrained language model; UNSTRUCTURED PROTEINS; WEB SERVER; SEQUENCES; REGIONS; RECOGNITION; SECONDARY; DATABASE; DISPROT; TOOL;
D O I
10.1093/bib/bbad173
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Determining intrinsically disordered regions of proteins is essential for elucidating protein biological functions and the mechanisms of their associated diseases. As the gap between the number of experimentally determined protein structures and the number of protein sequences continues to grow exponentially, there is a need for developing an accurate and computationally efficient disorder predictor. However, current single-sequence-based methods are of low accuracy, while evolutionary profile-based methods are computationally intensive. Here, we proposed a fast and accurate protein disorder predictor LMDisorder that employed embedding generated by unsupervised pretrained language models as features. We showed that LMDisorder performs best in all single-sequence-based methods and is comparable or better than another language-model-based technique in four independent test sets, respectively. Furthermore, LMDisorder showed equivalent or even better performance than the state-of-the-art profile-based technique SPOT-Disorder2. In addition, the high computation efficiency of LMDisorder enabled proteome-scale analysis of human, showing that proteins with high predicted disorder content were associated with specific biological functions. The datasets, the source codes, and the trained model are available at https://github.com/biomed-AI/LMDisorder.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model
    Yihe Pang
    Bin Liu
    BMC Biology, 22
  • [2] DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model
    Pang, Yihe
    Liu, Bin
    BMC BIOLOGY, 2024, 22 (01)
  • [3] Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion
    Yuan, Qianmu
    Xie, Junjie
    Xie, Jiancong
    Zhao, Huiying
    Yang, Yuedong
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [4] ESpritz: accurate and fast prediction of protein disorder
    Walsh, Ian
    Martin, Alberto J. M.
    Di Domenico, Tomas
    Tosatto, Silvio C. E.
    BIOINFORMATICS, 2012, 28 (04) : 503 - 509
  • [5] Does protein pretrained language model facilitate the prediction of protein-ligand interaction?
    Zhang, Weihong
    Hu, Fan
    Li, Wang
    Yin, Peng
    METHODS, 2023, 219 : 8 - 15
  • [6] PredIDR: Accurate prediction of protein intrinsic disorder regions using deep convolutional neural network
    Han, Kun-Sop
    Song, Se-Ryong
    Pak, Myong-hyon
    Kim, Chol-Song
    Ri, Chol-Pyok
    Del Conte, Alessio
    Piovesan, Damiano
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2025, 284
  • [7] Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction
    Weissenow, Konstantin
    Heinzinger, Michael
    Rost, Burkhard
    STRUCTURE, 2022, 30 (08) : 1169 - +
  • [8] DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model
    Fang, Yitian
    Jiang, Yi
    Wei, Leyi
    Ma, Qin
    Ren, Zhixiang
    Yuan, Qianmu
    Wei, Dong-Qing
    BIOINFORMATICS, 2023, 39 (12)
  • [9] Accurate prediction of nucleic acid binding proteins using protein language model
    Wu, Siwen
    Xu, Jinbo
    Guo, Jun-tao
    BIOINFORMATICS ADVANCES, 2025, 5 (01):
  • [10] Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins
    Kurgan, Lukasz
    Hu, Gang
    Wang, Kui
    Ghadermarzi, Sina
    Zhao, Bi
    Malhis, Nawar
    Erdos, Gabor
    Gsponer, Joerg
    Uversky, Vladimir N.
    Dosztanyi, Zsuzsanna
    NATURE PROTOCOLS, 2023, 18 (11) : 3157 - 3172