An Improved Profile-Level Domain Linker Propensity Index for Protein Domain Boundary Prediction

被引:15
作者
Zhang, Yanfeng [1 ]
Liu, Bin [1 ]
Dong, Qiwen [2 ]
Jin, Victor X. [3 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[3] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
关键词
Domain boundary; domain linker; multiple sequence alignments; sequence-based prediction; SUPPORT VECTOR MACHINES; AMINO-ACID-COMPOSITION; GO-PSEAA PREDICTOR; FUNCTIONAL DOMAIN; MEMBRANE-PROTEINS; STRUCTURAL CLASS; WEB-SERVER; ENSEMBLE CLASSIFIER; SEQUENCES; FAMILY;
D O I
10.2174/092986611794328717
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein domain boundary prediction is critical for understanding protein structure and function. In this study, we present a novel method, an order profile domain linker propensity index (OPI), which uses the evolutionary information extracted from the protein sequence frequency profiles calculated from the multiple sequence alignments. A protein sequence is first converted into smooth and normalized numeric order profiles by OPI, from which the domain linkers can be predicted. By discriminating the different frequencies of the amino acids in the protein sequence frequency profiles, OPI clearly shows better performance than our previous method, a binary profile domain linker propensity index (PDLI). We tested our new method on two different datasets, SCOP-1 dataset and SCOP-2 dataset, and we were able to achieve a precision of 0.82 and 0.91 respectively. OPI also outperforms other residue-level, profile-level indexes as well as other state-of-the-art methods.
引用
收藏
页码:7 / 16
页数:10
相关论文
共 86 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[3]  
[Anonymous], NAT SCI
[4]   HELIX GEOMETRY IN PROTEINS [J].
BARLOW, DJ ;
THORNTON, JM .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 201 (03) :601-619
[5]   THE PREDICTION OF PROTEIN DOMAINS [J].
BUSETTA, B ;
BARRANS, Y .
BIOCHIMICA ET BIOPHYSICA ACTA, 1984, 790 (02) :117-124
[6]   Identify catalytic triads of serine hydrolases by support vector machines [J].
Cai, YD ;
Zhou, GP ;
Jen, CH ;
Lin, SL ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2004, 228 (04) :551-557
[7]   Application of SVM to predict membrane protein types [J].
Cai, YD ;
Ricardo, PW ;
Jen, CH ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2004, 226 (04) :373-376
[8]   Support Vector Machine for predicting α-turn types [J].
Cai, YD ;
Feng, KY ;
Li, YX ;
Chou, KC .
PEPTIDES, 2003, 24 (04) :629-630
[9]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[10]   Support vector machines for prediction of protein signal sequences and their cleavage sites [J].
Cai, YD ;
Lin, SL ;
Chou, KC .
PEPTIDES, 2003, 24 (01) :159-161