Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields

被引：15

作者：

Liu, Yumeng ^{[1
]}

Chen, Shengyu ^{[2
]}

Wang, Xiaolong ^{[1
]}

Liu, Bin ^{[1
,3
,4
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[2] Indiana Univ, Sch Informat Comp & Engn, Bloomington, IN 47408 USA

[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China

[4] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China

来源：

MOLECULAR THERAPY-NUCLEIC ACIDS | 2019年 / 17卷

基金：

中国国家自然科学基金;

关键词：

ACCURATE PREDICTION; UNSTRUCTURED REGIONS; SEQUENCE; SERVER;

D O I：

10.1016/j.omtn.2019.06.004

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.

引用

页码：396 / 404

页数：9

共 66 条

[1] Accurate prediction of solvent accessibility using neural networks-based regression [J].

Adamczak, R ;

Porollo, A ;

Meller, J .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767

[2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[3]

[Anonymous], 2019, BIOINFORMATICS

[4] The Protein Data Bank [J].

Berman, HM ;

Westbrook, J ;

Feng, Z ;

Gilliland, G ;

Bhat, TN ;

Weissig, H ;

Shindyalov, IN ;

Bourne, PE .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242

[5] Accurate prediction of protein disordered regions by mining protein structure data [J].

Cheng, JL ;

Sweredoski, MJ ;

Baldi, P .

DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (03) :213-222

[6] Abundance of intrinsic disorder in protein associated with cardiovascular disease [J].

Cheng, Yugong ;

LeGall, Tanguy ;

Oldfield, Christopher J. ;

Dunker, A. Keith ;

Uversky, Vladimir N. .

BIOCHEMISTRY, 2006, 45 (35) :10448-10460

[7] Prediction of kinase-specific phosphorylation sites using conditional random fields [J].

Dang, Thanh Hai ;

Van Leemput, Koenraad ;

Verschoren, Alain ;

Laukens, Kris .

BIOINFORMATICS, 2008, 24 (24) :2857-2864

[8] An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions [J].

Deng, Xin ;

Gumm, Jordan ;

Karki, Suman ;

Eickholt, Jesse ;

Cheng, Jianlin .

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2015, 16 (07) :15384-15404

[9] A comprehensive overview of computational protein disorder prediction methods [J].

Deng, Xin ;

Eickholt, Jesse ;

Cheng, Jianlin .

MOLECULAR BIOSYSTEMS, 2012, 8 (01) :114-121

[10] Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis [J].

Ding, Hui ;

Feng, Peng-Mian ;

Chen, Wei ;

Lin, Hao .

MOLECULAR BIOSYSTEMS, 2014, 10 (08) :2229-2235

← 1 2 3 4 5 6 7 →