Using functional domain composition to predict enzyme family classes

被引:49
作者
Cai, YD [1 ]
Chou, KC
机构
[1] Univ Manchester, Inst Sci & Technol, Biomol Sci Dept, Manchester M60 1QD, Lancs, England
[2] Gordon Life Sci Inst, San Diego, CA 92130 USA
[3] TIBDD, Tianjin, Peoples R China
关键词
classification of enzyme commission; enzymatic attribute; functional domain composition; 20% threshold cutoff; nearest neighbor predictor; bioinformatics; proteomics;
D O I
10.1021/pr049835p
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
According to their main EC (Enzyme Commission) numbers, enzymes are classified into the following 6 main classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. A new method has been developed to predict the enzymatic attribute of proteins by introducing the functional domain composition to formulate a given protein sequence. The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 85% in identifying the enzyme family classes (including the identification of nonenzyme protein sequences as well). The success rate is significantly higher than those obtained by the other methods on such a stringent dataset. This indicates that using the functional domain composition to represent protein samples for statistical prediction is indeed very promising, and will become a powerful tool in bioinformatics and proteomics.
引用
收藏
页码:109 / 111
页数:3
相关论文
共 31 条
  • [1] [Anonymous], 1992, ENZYME NOMENCLATURE
  • [2] The InterPro database, an integrated documentation resource for protein families, domains and functional sites
    Apweiler, R
    Attwood, TK
    Bairoch, A
    Bateman, A
    Birney, E
    Biswas, M
    Bucher, P
    Cerutti, T
    Corpet, F
    Croning, MDR
    Durbin, R
    Falquet, L
    Fleischmann, W
    Gouzy, J
    Hermjakob, H
    Hulo, N
    Jonassen, I
    Kahn, D
    Kanapin, A
    Karavidopoulou, Y
    Lopez, R
    Marx, B
    Mulder, NJ
    Oinn, TM
    Pagni, M
    Servant, F
    Sigrist, CJA
    Zdobnov, EM
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 37 - 40
  • [3] Bahar I, 1997, PROTEINS, V29, P172, DOI 10.1002/(SICI)1097-0134(199710)29:2<172::AID-PROT5>3.3.CO
  • [4] 2-D
  • [5] The SWISS-PROT protein sequence data bank and its supplement TrEMBL
    Bairoch, A
    Apweller, R
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (01) : 31 - 36
  • [6] The ENZYME database in 2000
    Bairoch, A
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 304 - 305
  • [7] Enzyme family classification by support vector machines
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, YZ
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) : 66 - 76
  • [8] CHANDONIA JM, 1995, PROTEIN SCI, V4, P275
  • [9] A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS
    CHOU, JJW
    ZHANG, CT
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) : 251 - 262
  • [10] PREDICTION OF PROTEIN STRUCTURAL CLASSES
    CHOU, KC
    ZHANG, CT
    [J]. CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) : 275 - 349