CAZymes Analysis Toolkit (CAT): Web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database

被引:250
作者
Park, Byung H. [1 ]
Karpinets, Tatiana V. [2 ,3 ]
Syed, Mustafa H. [2 ]
Leuze, Michael R. [1 ]
Uberbacher, Edward C. [2 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
[2] Oak Ridge Natl Lab, Biosci Div, Oak Ridge, TN USA
[3] Univ Tennessee, Dept Plant Sci, Knoxville, TN 37996 USA
关键词
biofuel; carbohydrate-active enzymes; computational annotation; protein families; BINDING MODULES; HOMOLOGY;
D O I
10.1093/glycob/cwq106
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.
引用
收藏
页码:1574 / 1584
页数:11
相关论文
共 28 条
  • [1] Parallel mining of association rules
    Agrawal, R
    Shafer, JC
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) : 962 - 969
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] The Universal Protein Resource (UniProt) 2009
    Bairoch, Amos
    Consortium, UniProt
    Bougueleret, Lydie
    Altairac, Severine
    Amendolia, Valeria
    Auchincloss, Andrea
    Argoud-Puy, Ghislaine
    Axelsen, Kristian
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    Bolleman, Jerven
    Bollondi, Laurent
    Boutet, Emmanuel
    Quintaje, Silvia Braconi
    Breuza, Lionel
    Bridge, Alan
    deCastro, Edouard
    Ciapina, Luciane
    Coral, Danielle
    Coudert, Elisabeth
    Cusin, Isabelle
    Delbard, Gwennaelle
    Dornevil, Dolnide
    Roggli, Paula Duek
    Duvaud, Severine
    Estreicher, Anne
    Famiglietti, Livia
    Feuermann, Marc
    Gehant, Sebastian
    Farriol-Mathis, Nathalie
    Ferro, Serenella
    Gasteiger, Elisabeth
    Gateau, Alain
    Gerritsen, Vivienne
    Gos, Arnaud
    Gruaz-Gumowski, Nadine
    Hinz, Ursula
    Hulo, Chantal
    Hulo, Nicolas
    James, Janet
    Jimenez, Silvia
    Jungo, Florence
    Junker, Vivien
    Kappler, Thomas
    Keller, Guillaume
    Lachaize, Corinne
    Lane-Guermonprez, Lydie
    Langendijk-Genevaux, Petra
    Lara, Vicente
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D169 - D174
  • [4] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [5] Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
  • [6] Binding specificity and thermodynamics of a family 9 carbohydrate-binding module from Thermotoga maritima xylanase 10A
    Boraston, AB
    Creagh, AL
    Alam, MM
    Kormos, JM
    Tomme, P
    Haynes, CA
    Warren, RAJ
    Kilburn, DG
    [J]. BIOCHEMISTRY, 2001, 40 (21) : 6240 - 6247
  • [7] Carbohydrate-binding modules: fine-tuning polysaccharide recognition
    Boraston, AB
    Bolam, DN
    Gilbert, HJ
    Davies, GJ
    [J]. BIOCHEMICAL JOURNAL, 2004, 382 (03) : 769 - 781
  • [8] The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics
    Cantarel, Brandi L.
    Coutinho, Pedro M.
    Rancurel, Corinne
    Bernard, Thomas
    Lombard, Vincent
    Henrissat, Bernard
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D233 - D238
  • [9] Why are there so many carbohydrate-active enzyme-related genes in plants?
    Coutinho, PM
    Starn, M
    Blanc, E
    Henrissat, B
    [J]. TRENDS IN PLANT SCIENCE, 2003, 8 (12) : 563 - 565
  • [10] CDART: Protein homology by domain architecture
    Geer, LY
    Domrachev, M
    Lipman, DJ
    Bryant, SH
    [J]. GENOME RESEARCH, 2002, 12 (10) : 1619 - 1623