DisPredict3.0: Prediction of intrinsically disordered regions/ proteins using protein language model

被引:2
作者
UI Kabir, Md Wasi [1 ]
Hoque, Md Tamjidul [1 ]
机构
[1] Univ New Orleans, Dept Comp Sci, New Orleans, LA 70148 USA
基金
美国国家卫生研究院;
关键词
Protein language models; Intrinsically disordered proteins; Predict disordered protein; Machine learning; ACCURATE; DATABASE; DISPROT;
D O I
10.1016/j.amc.2024.128630
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Intrinsically disordered proteins (IDPs) or protein regions (IDRs) do not have a stable threedimensional structure, even though they exhibit important biological functions. They are structurally and functionally very different from ordered proteins and can cause many critical diseases. Accurate identification of disordered proteins/regions significantly impacts fields such as drug design, protein engineering, protein design, and related research. However, experimental identification of IDRs is complex and time-consuming, necessitating the development of an accurate and efficient computational method. The recent development of deep learning methods for protein language models shows the ability to learn evolutionary information from billions of protein sequences. This motivates us to develop a computational method, named DisPredict3.0, to predict proteins' disordered regions (IDRs) using evolutionary information from a protein language model. Compared to the state-of-the-art method in the CAID (2018) assessment, DisPredict3.0 has an improvement of 2.51 %, 16.13 %, 17.98 %, and 11.94 % in terms of AUC, F1score, MCC, and kappa, respectively. In addition, in the CAID-2 assessment (2022), DisPredict3.0 shows promising results and is ranked first for disorder residue prediction on the Disorder-NOX dataset. The DisPredict3.0 webserver is available at https://bmll.cs.uno.edu.
引用
收藏
页数:14
相关论文
共 41 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2022, CASP15
[3]   The PSIPRED Protein Analysis Workbench: 20 years on [J].
Buchan, Daniel W. A. ;
Jones, David T. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) :W402-W407
[4]  
Del Conte A., 2023, Nucleic Acids Res.
[5]   CAID prediction portal: a comprehensive service for predicting intrinsic disorder and binding regions in proteins [J].
Del Conte, Alessio ;
Bouhraoua, Adel ;
Mehdiabadi, Mahta ;
Clementel, Damiano ;
Monzon, Alexander Miguel ;
CAID Predictors, Damiano ;
Tosatto, Silvio C. E. ;
Piovesan, Damiano .
NUCLEIC ACIDS RESEARCH, 2023, 51 (W1) :W62-W69
[6]   Prediction of protein disorder based on IUPred [J].
Dosztanyi, Zsuzsanna .
PROTEIN SCIENCE, 2018, 27 (01) :331-340
[7]   Compositional Effects and Optical Properties of P2O5 Doped Magnesium Silicate Mesoporous Thin Films [J].
El Nahrawy, Amany M. ;
Abou Hammad, Ali B. ;
Mansour, A. M. .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (06) :5893-5906
[8]   Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure [J].
Emenecker, Ryan J. ;
Griffith, Daniel ;
Holehouse, Alex S. .
BIOPHYSICAL JOURNAL, 2021, 120 (20) :4312-4319
[9]   IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation [J].
Erdos, Gabor ;
Pajkos, Matyas ;
Dosztanyi, Zsuzsanna .
NUCLEIC ACIDS RESEARCH, 2021, 49 (W1) :W297-W303
[10]   Controllable protein design with language models [J].
Ferruz, Noelia ;
Hoecker, Birte .
NATURE MACHINE INTELLIGENCE, 2022, 4 (06) :521-532