Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields

被引:15
作者
Liu, Yumeng [1 ]
Chen, Shengyu [2 ]
Wang, Xiaolong [1 ]
Liu, Bin [1 ,3 ,4 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[2] Indiana Univ, Sch Informat Comp & Engn, Bloomington, IN 47408 USA
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
[4] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China
来源
MOLECULAR THERAPY-NUCLEIC ACIDS | 2019年 / 17卷
基金
中国国家自然科学基金;
关键词
ACCURATE PREDICTION; UNSTRUCTURED REGIONS; SEQUENCE; SERVER;
D O I
10.1016/j.omtn.2019.06.004
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
引用
收藏
页码:396 / 404
页数:9
相关论文
共 66 条
[1]   Accurate prediction of solvent accessibility using neural networks-based regression [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 2019, BIOINFORMATICS
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Accurate prediction of protein disordered regions by mining protein structure data [J].
Cheng, JL ;
Sweredoski, MJ ;
Baldi, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (03) :213-222
[6]   Abundance of intrinsic disorder in protein associated with cardiovascular disease [J].
Cheng, Yugong ;
LeGall, Tanguy ;
Oldfield, Christopher J. ;
Dunker, A. Keith ;
Uversky, Vladimir N. .
BIOCHEMISTRY, 2006, 45 (35) :10448-10460
[7]   Prediction of kinase-specific phosphorylation sites using conditional random fields [J].
Dang, Thanh Hai ;
Van Leemput, Koenraad ;
Verschoren, Alain ;
Laukens, Kris .
BIOINFORMATICS, 2008, 24 (24) :2857-2864
[8]   An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions [J].
Deng, Xin ;
Gumm, Jordan ;
Karki, Suman ;
Eickholt, Jesse ;
Cheng, Jianlin .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2015, 16 (07) :15384-15404
[9]   A comprehensive overview of computational protein disorder prediction methods [J].
Deng, Xin ;
Eickholt, Jesse ;
Cheng, Jianlin .
MOLECULAR BIOSYSTEMS, 2012, 8 (01) :114-121
[10]   Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis [J].
Ding, Hui ;
Feng, Peng-Mian ;
Chen, Wei ;
Lin, Hao .
MOLECULAR BIOSYSTEMS, 2014, 10 (08) :2229-2235