Comparative genomics using data mining tools

被引:8
|
作者
Nandi, T [1 ]
B-Rao, C [1 ]
Ramachandran, S [1 ]
机构
[1] Ctr Biochem Technol, Funct Genom Unit, Delhi 110007, India
关键词
comparative genomics; compositional analysis; data mining; sequence complexity;
D O I
10.1007/BF02703680
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis were Methanococcus jannaschii, Haemophilus influenzae and Saccharomyces cerevisiae. We have identified the common and different features between the three genomes in the protein evolution patterns. M. jannaschii has been seen to have a greater number of proteins with more charged amino acids whereas S, cerevisiae has been observed to have a greater number of hydrophilic proteins. Despite the differences in intrinsic compositional characteristics between the proteins from the different genomes we have also identified certain common characteristics. We have carried out exploratory Principal Component Analysis of the multivariate data on the proteins of each organism in an effort to classify the proteins into clusters. Interestingly, we found that most of the proteins in cacti organism cluster closely together, but there are a few, 'outliers'. We focus on the outliers for the functional investigations, which may aid in revealing any unique features of the biology of the respective organisms.
引用
收藏
页码:15 / 25
页数:11
相关论文
共 50 条
  • [1] Comparative genomics using data mining tools
    Tannistha Nandi
    Chandrika B-Rao
    Srinivasan Ramachandran
    Journal of Biosciences, 2002, 27 : 15 - 25
  • [2] A Comparative Analysis of the Different Data Mining Tools by Using Supervised Learning Algorithms
    Goyal, Akarsh
    Khandelwal, Ishan
    Anand, Rahul
    Srivastava, Anan
    Swarnalatha, P.
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2016), 2018, 614 : 105 - 112
  • [3] Comparative Study of Data Mining Tools used for Clustering
    Aalam, Parvej
    Siddiqui, Tamanna
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 3971 - 3975
  • [4] Open Source Data Mining Tools A Comparative Study
    Al-Odan, Hussah A.
    Saud, Ahmad A. Al-Daraiseh King
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT 2015), 2015, : 369 - 374
  • [5] Data mining in genomics
    Lee, Jae K.
    Williams, Paul D.
    Cheon, Sooyoung
    CLINICS IN LABORATORY MEDICINE, 2008, 28 (01) : 145 - +
  • [6] Predictability of interest rates using data mining tools: A comparative analysis of Korea and the US
    Kim, SH
    Noh, HJ
    EXPERT SYSTEMS WITH APPLICATIONS, 1997, 13 (02) : 85 - 95
  • [7] SUPERFAMILY-sophisticated comparative genomics, data mining, visualization and phylogeny
    Wilson, Derek
    Pethica, Ralph
    Zhou, Yiduo
    Talbot, Charles
    Vogel, Christine
    Madera, Martin
    Chothia, Cyrus
    Gough, Julian
    NUCLEIC ACIDS RESEARCH, 2009, 37 : D380 - D386
  • [8] Failure Diagnostics Using Data Mining Tools
    Vaz, Carlos A., Jr.
    Araujo, Ofelia de Q. F.
    de Medeiros, Jose Luiz
    10TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING, 2009, 27 : 1539 - 1544
  • [9] A Comparative Study of Famous Classification Techniques and Data Mining Tools
    Paul, Yash
    Kumar, Neerendra
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 627 - 644
  • [10] Data mining in genomics and proteomics
    Bensmail, H
    Haoudi, A
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 63 - 64