Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining

被引:2
|
作者
King, RD [1 ]
Karwath, A
Clare, A
Dehaspe, L
机构
[1] Univ Coll Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
[2] PharmaDM, B-3001 Louvain, Belgium
关键词
machine learning; clustering; ILP; bioinformatics;
D O I
10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M, tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M, tuberculosis and 24% of those in E, coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology, These rules give insight into the evolutionary history of M. tuberculosis and E, coli, Copyright (C) 2000 John Wiley & Sons, Ltd.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 39 条
  • [1] Expression of membrane proteins from Mycobacterium tuberculosis in Escherichia coli as fusions with maltose binding protein
    Korepanova, A.
    Moore, J. D.
    Nguyen, H. B.
    Hua, Y.
    Cross, T. A.
    Gao, F.
    PROTEIN EXPRESSION AND PURIFICATION, 2007, 53 (01) : 24 - 30
  • [2] Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction
    Mazandu, Gaston K.
    Mulder, Nicola J.
    INFECTION GENETICS AND EVOLUTION, 2012, 12 (05) : 922 - 932
  • [3] Protein functional class prediction using global encoding of amino acid sequence
    Li, Xi
    Liao, Bo
    Shu, Yu
    Zeng, Qingguang
    Luo, Jiawei
    JOURNAL OF THEORETICAL BIOLOGY, 2009, 261 (02) : 290 - 293
  • [4] Distinct properties of Mycobacterium tuberculosis single-stranded DNA binding protein and its functional characterization in Escherichia coli
    Handa, P
    Acharya, N
    Thanedar, S
    Purnapatre, K
    Varshney, U
    NUCLEIC ACIDS RESEARCH, 2000, 28 (19) : 3823 - 3829
  • [5] Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs
    Huo, Tong
    Liu, Wei
    Guo, Yu
    Yang, Cheng
    Lin, Jianping
    Rao, Zihe
    BMC BIOINFORMATICS, 2015, 16
  • [6] Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs
    Tong Huo
    Wei Liu
    Yu Guo
    Cheng Yang
    Jianping Lin
    Zihe Rao
    BMC Bioinformatics, 16
  • [7] Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN
    Kuang, Xingyan
    Wang, Fan
    Hernandez, Kyle M.
    Zhang, Zhenyu
    Grossman, Robert L.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [8] Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN
    Xingyan Kuang
    Fan Wang
    Kyle M. Hernandez
    Zhenyu Zhang
    Robert L. Grossman
    Scientific Reports, 12
  • [9] Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method
    Burger, Lukas
    van Nimwegen, Erik
    MOLECULAR SYSTEMS BIOLOGY, 2008, 4 (1)
  • [10] A GENE FROM MYCOBACTERIUM-TUBERCULOSIS WHICH IS HOMOLOGOUS TO THE DNAJ HEAT-SHOCK PROTEIN OF ESCHERICHIA-COLI
    LATHIGRA, RB
    YOUNG, DB
    SWEETSER, D
    YOUNG, RA
    NUCLEIC ACIDS RESEARCH, 1988, 16 (04) : 1636 - 1636