Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

被引:1
|
作者
Li, Ying [1 ]
Wang, Nan [1 ]
Zhang, Chaoyang [1 ]
Perkins, Edward J. [2 ]
Gong, Ping [3 ]
机构
[1] Univ So Mississippi, Hattiesburg, MS 39401 USA
[2] US Army Engn Res & Dev Ctr, Vicksburg, MS 39180 USA
[3] SpecPro Inc, Vicksburg, MS 39180 USA
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
Biomarker; Classification; Decision tree; Support vector machine; Clustering; Earthworm Microarray; CANCER CLASSIFICATION;
D O I
10.1109/IJCBS.2009.134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3.5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.
引用
收藏
页码:23 / +
页数:2
相关论文
共 50 条
  • [21] Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis
    Jaskowiak, Pablo A.
    Campello, Ricardo J. G. B.
    Costa, Ivan G.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (04) : 845 - 857
  • [22] Dimensionality reduction for microarray data using local mean based discriminant analysis
    Cui, Yan
    Zheng, Chun-Hou
    Yang, Jian
    BIOTECHNOLOGY LETTERS, 2013, 35 (03) : 331 - 336
  • [23] Challenges from Clustering Analysis to Knowledge Discovery in Molecular Biomechanics
    Ping, Loh Wei
    CURRENT BIOINFORMATICS, 2012, 7 (03) : 333 - 339
  • [24] Informative top-k class associative rule for cancer biomarker discovery on microarray data
    Ong, Huey Fang
    Mustapha, Norwati
    Hamdan, Hazlina
    Rosli, Rozita
    Mustapha, Aida
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 146
  • [25] Biomarker Signature Discovery from Mass Spectrometry Data
    Kong, Ao
    Gupta, Chinmaya
    Ferrari, Mauro
    Agostini, Marco
    Bedin, Chiara
    Bouamrani, Ali
    Tasciotti, Ennio
    Azencott, Robert
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 766 - 772
  • [26] A hybrid of clustering and quantum genetic algorithm for relevant genes selection for cancer microarray data
    Sardana, Manju
    Agrawal, R. K.
    Kaur, Baljeet
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2016, 20 (03) : 161 - 173
  • [27] Computational method for discovery of biomarker signatures from large, complex data sets
    Makarov, Vladimir
    Gorlin, Alex
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 76 : 161 - 168
  • [28] COMPRESSIVE REGULARIZED DISCRIMINANT ANALYSIS OF HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO MICROARRAY STUDIES
    Tabassum, Muhammad Naveed
    Ollila, Esa
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4204 - 4208
  • [29] Fuzzy mixed-prototype clustering algorithm for microarray data analysis
    Liu, Jin
    Pham, Tuan D.
    Yan, Hong
    Liang, Zhizhen
    NEUROCOMPUTING, 2018, 276 : 42 - 54
  • [30] Performance analysis of clustering techniques over microarray data: A case study
    Dash, Rasmita
    Misra, Bijan Bihari
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 493 : 162 - 176