Learning to classify species with barcodes

被引:68
作者
Bertolazzi, Paola [1 ]
Felici, Giovanni [1 ]
Weitschek, Emanuel [1 ]
机构
[1] CNR, Ist Analisi Sistemi & Informat Antonio Ruberti, I-00185 Rome, Italy
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
MITOCHONDRIAL COI; DNA; GENES;
D O I
10.1186/1471-2105-10-S14-S7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the gene cytochrome c oxidase I (COI) (the "barcode"). Since 2004 the Consortium for the Barcode of Life (CBOL) promotes the collection of barcode specimens and the development of methods to analyze the barcode for several tasks, among which the identification of rules to correctly classify an individual into its species by reading its barcode. Results: We adopt a Logic Mining method based on two optimization models and present the results obtained on two datasets where a number of COI fragments are used to describe the individuals that belong to different species. The method proposed exhibits high correct recognition rates on a training-testing split of the available data using a small proportion of the information available (e. g., correct recognition approx. 97% when only 20 sites of the 648 available are used). The method is able to provide compact formulas on the values (A, C, G, T) at the selected sites that synthesize the characteristic of each species, a relevant information for taxonomists. Conclusion: We have presented a Logic Mining technique designed to analyze barcode data and to provide detailed output of interest to the taxonomists and the barcode community represented in the CBOL Consortium. The method has proven to be effective, efficient and precise.
引用
收藏
页数:12
相关论文
共 31 条
  • [21] A molecular view of microbial diversity and the biosphere
    Pace, NR
    [J]. SCIENCE, 1997, 276 (5313) : 734 - 740
  • [22] Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata
    Rach, J.
    DeSalle, R.
    Sarkar, I. N.
    Schierwater, B.
    Hadrys, H.
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 275 (1632) : 237 - 247
  • [23] Testing the reliability of genetic methods of species identification via simulation
    Ross, Howard A.
    Murugan, Sumathi
    Li, Wai Lok Sibon
    [J]. SYSTEMATIC BIOLOGY, 2008, 57 (02) : 216 - 230
  • [24] Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system
    Saccone, C
    De Giorgi, C
    Gissi, C
    Pesole, G
    Reyes, A
    [J]. GENE, 1999, 238 (01) : 195 - 209
  • [25] THE NEIGHBOR-JOINING METHOD - A NEW METHOD FOR RECONSTRUCTING PHYLOGENETIC TREES
    SAITOU, N
    NEI, M
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1987, 4 (04) : 406 - 425
  • [26] Characteristic attributes in cancer microarrays
    Sarkar, IN
    Planet, PJ
    Bael, TE
    Stanley, SE
    Siddall, M
    DeSalle, R
    Figurski, DH
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (02) : 111 - 122
  • [27] Sarkar IN, 2002, MOL PHYLOGENET EVOL, V24, P388
  • [28] DNA barcodes reveal cryptic host-specificity within the presumed polyphagous members of a genus of parasitoid flies (Diptera: Tachinidae)
    Smith, MA
    Woodley, NE
    Janzen, DH
    Hallwachs, W
    Hebert, PDN
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (10) : 3657 - 3662
  • [29] Truemper K., 2004, DESIGN LOGIC BASED I
  • [30] PHYLOGENETIC STRUCTURE OF PROKARYOTIC DOMAIN - PRIMARY KINGDOMS
    WOESE, CR
    FOX, GE
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1977, 74 (11) : 5088 - 5090