Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining

被引:2
|
作者
King, RD [1 ]
Karwath, A
Clare, A
Dehaspe, L
机构
[1] Univ Coll Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
[2] PharmaDM, B-3001 Louvain, Belgium
关键词
machine learning; clustering; ILP; bioinformatics;
D O I
10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M, tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M, tuberculosis and 24% of those in E, coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology, These rules give insight into the evolutionary history of M. tuberculosis and E, coli, Copyright (C) 2000 John Wiley & Sons, Ltd.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 39 条
  • [11] NMRDSP: An Accurate Prediction of Protein Shape Strings from NMR Chemical Shifts and Sequence Data
    Mao, Wusong
    Cong, Peisheng
    Wang, Zhiheng
    Lu, Longjian
    Zhu, Zhongliang
    Li, Tonghua
    PLOS ONE, 2013, 8 (12):
  • [12] Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae:: lessons from supervised machine learning in functional genomics
    Lin, K
    Kuang, YY
    Joseph, JS
    Kolatkar, PR
    NUCLEIC ACIDS RESEARCH, 2002, 30 (11) : 2599 - 2607
  • [13] A SPECIES-SPECIFIC NUCLEOTIDE-SEQUENCE OF MYCOBACTERIUM-TUBERCULOSIS ENCODES A PROTEIN THAT EXHIBITS HEMOLYTIC-ACTIVITY WHEN EXPRESSED IN ESCHERICHIA-COLI
    LEAO, SC
    ROCHA, CL
    MURILLO, LA
    PARRA, CA
    PATARROYO, ME
    INFECTION AND IMMUNITY, 1995, 63 (11) : 4301 - 4306
  • [14] Deeper investigation into the utility of functional class scoring in missing protein prediction from proteomics data
    Zhao, Yaxing
    Sue, Andrew Chi-Hau
    Bin Goh, Wilson Wen
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2019, 17 (02)
  • [15] ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole- genome sequence data
    Bessonov, Kyrylo
    Laing, Chad
    Robertson, James
    Yong, Irene
    Ziebell, Kim
    Gannon, Victor P. J.
    Nichani, Anil
    Arya, Gitanjali
    Nash, John H. E.
    Christianson, Sara
    MICROBIAL GENOMICS, 2021, 7 (12):
  • [16] PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data
    Hawkins, Troy
    Chitale, Meghana
    Luban, Stanislav
    Kihara, Daisuke
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 74 (03) : 566 - 582
  • [17] Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis
    Mahajan, Gaurang
    Mande, Shekhar C.
    BMC BIOINFORMATICS, 2017, 18
  • [18] Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis
    Gaurang Mahajan
    Shekhar C. Mande
    BMC Bioinformatics, 18
  • [19] Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information
    An, Ji-Yong
    You, Zhu-Hong
    Chen, Xing
    Huang, De-Shuang
    Yan, Guiying
    Wang, Da-Fu
    MOLECULAR BIOSYSTEMS, 2016, 12 (12) : 3702 - 3710
  • [20] CONSERVATION ANALYSIS AND STRUCTURE PREDICTION OF THE PROTEIN SERINE/THREONINE PHOSPHATASES - SEQUENCE SIMILARITY WITH DIADENOSINE TETRAPHOSPHATASE FROM ESCHERICHIA-COLI SUGGESTS HOMOLOGY TO THE PROTEIN PHOSPHATASES
    BARTON, GJ
    COHEN, PTW
    BARFORD, D
    EUROPEAN JOURNAL OF BIOCHEMISTRY, 1994, 220 (01): : 225 - 237