Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining

被引:2
|
作者
King, RD [1 ]
Karwath, A
Clare, A
Dehaspe, L
机构
[1] Univ Coll Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
[2] PharmaDM, B-3001 Louvain, Belgium
关键词
machine learning; clustering; ILP; bioinformatics;
D O I
10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M, tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M, tuberculosis and 24% of those in E, coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology, These rules give insight into the evolutionary history of M. tuberculosis and E, coli, Copyright (C) 2000 John Wiley & Sons, Ltd.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 39 条
  • [31] Using a whole genome co-expression network to inform the functional characterisation of predicted genomic elements from Mycobacterium tuberculosis transcriptomic data
    Stiens, Jennifer
    Tan, Yen Yi
    Joyce, Rosanna
    Arnvig, Kristine B.
    Kendall, Sharon L.
    Nobeli, Irene
    MOLECULAR MICROBIOLOGY, 2023, 119 (04) : 381 - 400
  • [32] Investigation of functional aspects of the N-terminal region of elongation factor Tu from Escherichia coli using a protein engineering approach
    Laurberg, M
    Mansilla, F
    Clark, BFC
    Knudsen, CR
    JOURNAL OF BIOLOGICAL CHEMISTRY, 1998, 273 (08) : 4387 - 4391
  • [33] PROTEIN-B1 OF RIBONUCLEOTIDE REDUCTASE - DIRECT ANALYTICAL DATA AND COMPARISONS WITH DATA INDIRECTLY DEDUCED FROM THE NUCLEOTIDE-SEQUENCE OF THE ESCHERICHIA-COLI NRDA GENE
    SJOBERG, BM
    ERIKSSON, S
    JORNVALL, H
    CARLQUIST, M
    EKLUND, H
    EUROPEAN JOURNAL OF BIOCHEMISTRY, 1985, 150 (03): : 423 - 427
  • [34] Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock
    Avila-Jaime, B.
    Kawas, J. R.
    Garcia-Mazcorro, J. F.
    LIVESTOCK SCIENCE, 2018, 213 : 28 - 34
  • [35] PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection
    Wang, Huilin
    Wang, Mingjun
    Tan, Hao
    Li, Yuan
    Zhang, Ziding
    Song, Jiangning
    PLOS ONE, 2014, 9 (08):
  • [36] Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
    Riley, Todd R.
    Lazarovici, Allan
    Mann, Richard S.
    Bussemaker, Harmen J.
    ELIFE, 2015, 4
  • [37] CLONING AND SEQUENCING OF A UNIQUE ANTIGEN MPT70 FROM MYCOBACTERIUM-TUBERCULOSIS H37RV AND EXPRESSION IN BCG USING ESCHERICHIA-COLI MYCOBACTERIA SHUTTLE VECTOR
    MATSUMOTO, S
    MATSUO, T
    OHARA, N
    HOTOKEZAKA, H
    NAITO, M
    MINAMI, J
    YAMADA, T
    SCANDINAVIAN JOURNAL OF IMMUNOLOGY, 1995, 41 (03) : 281 - 287
  • [38] PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations
    Li, Liqi
    Cui, Xiang
    Yu, Sanjiu
    Zhang, Yuan
    Luo, Zhong
    Yang, Hua
    Zhou, Yue
    Zheng, Xiaoqi
    PLOS ONE, 2014, 9 (03):
  • [39] Using whole-genome sequence data to examine the epidemiology of antimicrobial resistance in Escherichia coli from wild meso-mammals and environmental sources on swine farms, conservation areas, and the Grand River watershed in southern Ontario, Canada
    Vogt, Nadine A.
    Hetman, Benjamin M.
    Vogt, Adam A.
    Pearl, David L.
    Reid-Smith, Richard J.
    Parmley, E. Jane
    Kadykalo, Stefanie
    Ziebell, Kim
    Bharat, Amrita
    Mulvey, Michael R.
    Janecko, Nicol
    Ricker, Nicole
    Allen, Samantha E.
    Bondo, Kristin J.
    Jardine, Claire M.
    PLOS ONE, 2022, 17 (04):