DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model

被引:13
作者
Pang, Yihe [1 ]
Liu, Bin [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, 5 South Zhongguancun St, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, 5 South Zhongguancun St, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein intrinsic disorder; Disordered function prediction; Protein language model; Graph-based interaction protein language model; BINDING REGIONS; WEB SERVER; SEQUENCE; VIF; IDENTIFICATION; DOMAIN; INFORMATION; EVOLUTION; DISPROT; GENE;
D O I
10.1186/s12915-023-01803-y
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Intrinsically disordered proteins and regions (IDPs/IDRs) are functionally important proteins and regions that exist as highly dynamic conformations under natural physiological conditions. IDPs/IDRs exhibit a broad range of molecular functions, and their functions involve binding interactions with partners and remaining native structural flexibility. The rapid increase in the number of proteins in sequence databases and the diversity of disordered functions challenge existing computational methods for predicting protein intrinsic disorder and disordered functions. A disordered region interacts with different partners to perform multiple functions, and these disordered functions exhibit different dependencies and correlations. In this study, we introduce DisoFLAG, a computational method that leverages a graph-based interaction protein language model (GiPLM) for jointly predicting disorder and its multiple potential functions. GiPLM integrates protein semantic information based on pre-trained protein language models into graph-based interaction units to enhance the correlation of the semantic representation of multiple disordered functions. The DisoFLAG predictor takes amino acid sequences as the only inputs and provides predictions of intrinsic disorder and six disordered functions for proteins, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linker. We evaluated the predictive performance of DisoFLAG following the Critical Assessment of protein Intrinsic Disorder (CAID) experiments, and the results demonstrated that DisoFLAG offers accurate and comprehensive predictions of disordered functions, extending the current coverage of computationally predicted disordered function categories. The standalone package and web server of DisoFLAG have been established to provide accurate prediction tools for intrinsic disorders and their associated functions.
引用
收藏
页数:15
相关论文
共 90 条
[1]   Inter-domain movements in polyketide synthases: a molecular dynamics study [J].
Anand, Swadha ;
Mohanty, Debasisa .
MOLECULAR BIOSYSTEMS, 2012, 8 (04) :1157-1171
[2]   m5U-SVM: identification of RNA 5-methyluridine modification sites based on multi-view features of physicochemical features and distributed representation [J].
Ao, Chunyan ;
Ye, Xiucai ;
Sakurai, Tetsuya ;
Zou, Quan ;
Yu, Liang .
BMC BIOLOGY, 2023, 21 (01)
[3]   UniProt: the Universal Protein Knowledgebase in 2023 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Cukura, Austra ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Garmiri, Penelope ;
Gonzales, Leonardo Jose da Costa ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Kandasaamy, Swaathi ;
Lock, Antonia ;
Luciani, Aurelien ;
Lugaric, Marija ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Raposo, Pedro ;
Rice, Daniel L. ;
Saidi, Rabie ;
Santos, Rafael ;
Speretta, Elena ;
Stephenson, James ;
Totoo, Prabhat ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Warner, Kate ;
Watkins, Xavier ;
Zellner, Hermann .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D523-D531
[4]   A New Class of Antiretroviral Enabling Innate Immunity by Protecting APOBEC3 from HIV Vif-Dependent Degradation [J].
Bennett, Ryan P. ;
Salter, Jason D. ;
Smith, Harold C. .
TRENDS IN MOLECULAR MEDICINE, 2018, 24 (05) :507-520
[5]   Learning the protein language: Evolution, structure, and function [J].
Bepler, Tristan ;
Berger, Bonnie .
CELL SYSTEMS, 2021, 12 (06) :654-+
[6]   RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning [J].
Burley, Stephen K. ;
Bhikadiya, Charmi ;
Bi, Chunxiao ;
Bittrich, Sebastian ;
Chao, Henry ;
Chen, Li ;
Craig, Paul A. ;
Crichlow, Gregg, V ;
Dalenberg, Kenneth ;
Duarte, Jose M. ;
Dutta, Shuchismita ;
Fayazi, Maryam ;
Feng, Zukang ;
Flatt, Justin W. ;
Ganesan, Sai ;
Ghosh, Sutapa ;
Goodsell, David S. ;
Green, Rachel Kramer ;
Guranovic, Vladimir ;
Henry, Jeremy ;
Hudson, Brian P. ;
Khokhriakov, Igor ;
Lawson, Catherine L. ;
Liang, Yuhe ;
Lowe, Robert ;
Peisach, Ezra ;
Persikova, Irina ;
Piehl, Dennis W. ;
Rose, Yana ;
Sali, Andrej ;
Segura, Joan ;
Sekharan, Monica ;
Shao, Chenghua ;
Vallat, Brinda ;
Voigt, Maria ;
Webb, Ben ;
Westbrook, John D. ;
Whetstone, Shamara ;
Young, Jasmine Y. ;
Zalevsky, Arthur ;
Zardecki, Christine .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D488-D508
[7]   webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study [J].
Cao, Chen ;
Wang, Jianhua ;
Kwok, Devin ;
Cui, Feifei ;
Zhang, Zilong ;
Zhao, Da ;
Li, Mulin Jun ;
Zou, Quan .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D1123-D1130
[8]   Potent antibiotic design via guided search from antibacterial activity evaluations [J].
Chen, Lu ;
Yu, Liang ;
Gao, Lin .
BIOINFORMATICS, 2023, 39 (02)
[9]   Rational drug design via intrinsically disordered protein [J].
Cheng, Yugong ;
LeGall, Tanguy ;
Oldfield, Christopher J. ;
Mueller, James P. ;
Van, Ya-Yue J. ;
Romero, Pedro ;
Cortese, Marc S. ;
Uversky, Vladimir N. ;
Dunker, A. Keith .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (10) :435-442
[10]   Abundance of intrinsic disorder in protein associated with cardiovascular disease [J].
Cheng, Yugong ;
LeGall, Tanguy ;
Oldfield, Christopher J. ;
Dunker, A. Keith ;
Uversky, Vladimir N. .
BIOCHEMISTRY, 2006, 45 (35) :10448-10460