Using functional domain composition to predict enzyme family classes

被引：50

作者：

Cai, YD ^{[1
]}

Chou, KC

机构：

[1] Univ Manchester, Inst Sci & Technol, Biomol Sci Dept, Manchester M60 1QD, Lancs, England

[2] Gordon Life Sci Inst, San Diego, CA 92130 USA

[3] TIBDD, Tianjin, Peoples R China

来源：

JOURNAL OF PROTEOME RESEARCH | 2005年 / 4卷 / 01期

关键词：

classification of enzyme commission; enzymatic attribute; functional domain composition; 20% threshold cutoff; nearest neighbor predictor; bioinformatics; proteomics;

D O I：

10.1021/pr049835p

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

According to their main EC (Enzyme Commission) numbers, enzymes are classified into the following 6 main classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. A new method has been developed to predict the enzymatic attribute of proteins by introducing the functional domain composition to formulate a given protein sequence. The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 85% in identifying the enzyme family classes (including the identification of nonenzyme protein sequences as well). The success rate is significantly higher than those obtained by the other methods on such a stringent dataset. This indicates that using the functional domain composition to represent protein samples for statistical prediction is indeed very promising, and will become a powerful tool in bioinformatics and proteomics.

引用

页码：109 / 111

页数：3

共 31 条

[1]

[Anonymous], 1992, ENZYME NOMENCLATURE

[2] The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].

Apweiler, R ;

Attwood, TK ;

Bairoch, A ;

Bateman, A ;

Birney, E ;

Biswas, M ;

Bucher, P ;

Cerutti, T ;

Corpet, F ;

Croning, MDR ;

Durbin, R ;

Falquet, L ;

Fleischmann, W ;

Gouzy, J ;

Hermjakob, H ;

Hulo, N ;

Jonassen, I ;

Kahn, D ;

Kanapin, A ;

Karavidopoulou, Y ;

Lopez, R ;

Marx, B ;

Mulder, NJ ;

Oinn, TM ;

Pagni, M ;

Servant, F ;

Sigrist, CJA ;

Zdobnov, EM .

NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40

[3]

Bahar I, 1997, PROTEINS, V29, P172, DOI 10.1002/(SICI)1097-0134(199710)29:2<172::AID-PROT5>3.3.CO

[4]

2-D

[5] The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].

Bairoch, A ;

Apweller, R .

NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36

[6] The ENZYME database in 2000 [J].

Bairoch, A .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305

[7] Enzyme family classification by support vector machines [J].

Cai, CZ ;

Han, LY ;

Ji, ZL ;

Chen, YZ .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) :66-76

[8]

CHANDONIA JM, 1995, PROTEIN SCI, V4, P275

[9] A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].

CHOU, JJW ;

ZHANG, CT .

JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262

[10] PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].

CHOU, KC ;

ZHANG, CT .

CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349

← 1 2 3 4 →