Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

被引:1
作者
Li, Ying [1 ]
Wang, Nan [1 ]
Zhang, Chaoyang [1 ]
Perkins, Edward J. [2 ]
Gong, Ping [3 ]
机构
[1] Univ So Mississippi, Hattiesburg, MS 39401 USA
[2] US Army Engn Res & Dev Ctr, Vicksburg, MS 39180 USA
[3] SpecPro Inc, Vicksburg, MS 39180 USA
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
Biomarker; Classification; Decision tree; Support vector machine; Clustering; Earthworm Microarray; CANCER CLASSIFICATION;
D O I
10.1109/IJCBS.2009.134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3.5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.
引用
收藏
页码:23 / +
页数:2
相关论文
共 50 条
  • [31] A Novel Approach to Select Important Genes from Microarray Data
    Wang, Xianchang
    Zhang, Lishi
    Du, Junfu
    2011 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, 2011, : 3489 - 3492
  • [32] Informative gene discovery for cancer classification from microarray expression data
    Ng, M
    Chan, LW
    2005 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2005, : 393 - 398
  • [33] Outlier Analysis and Top Scoring Pair for Integrated Data Analysis and Biomarker Discovery
    Ochs, Michael F.
    Farrar, Jason E.
    Considine, Michael
    Wei, Yingying
    Meshinchi, Soheil
    Arceci, Robert J.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (03) : 520 - 532
  • [34] Gene Network Modules-Based Liner Discriminant Analysis of Microarray Gene Expression Data
    Hu, Pingzhao
    Bull, Shelley
    Jiang, Hui
    BIOINFORMATICS RESEARCH AND APPLICATIONS, 2011, 6674 : 286 - +
  • [35] Identification of Biomarker for Cutaneous Squamous Cell Carcinoma Using Microarray Data Analysis
    Wei, Wei
    Chen, Yan
    Xu, Jie
    Zhou, Yu
    Bai, Xinping
    Yang, Ming
    Zhu, Ju
    JOURNAL OF CANCER, 2018, 9 (02): : 400 - 406
  • [36] Chronicle Discovery for Diagnosis from Raw Data: A Clustering Approach
    Sahuguede, Alexandre
    Le Corronc, Euriell
    Le Lann, Marie-Veronique
    IFAC PAPERSONLINE, 2018, 51 (24): : 1 - 8
  • [37] Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus
    Qiu, Yang
    Rajagopalan, Dilip
    Connor, Susan C.
    Damian, Doris
    Zhu, Lei
    Handzel, Amir
    Hu, Guanghui
    Amanullah, Arshad
    Bao, Steve
    Woody, Nathaniel
    MacLean, David
    Lee, Kwan
    Vanderwall, Dana
    Ryan, Terence
    METABOLOMICS, 2008, 4 (04) : 337 - 346
  • [38] Selecting Informative Genes from Microarray Data by Using a Cyclic GA-based Method
    Mohamad, Mohd Saberi
    Omatu, Sigeru
    Deris, Safaai
    Yoshioka, Michifumi
    UKSIM-AMSS FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2010, : 15 - +
  • [39] Classification from microarray data using probabilistic discriminant partial least squares with reject option
    Botella, Cristina
    Ferre, Joan
    Boque, Ricard
    TALANTA, 2009, 80 (01) : 321 - 328
  • [40] Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus
    Yang Qiu
    Dilip Rajagopalan
    Susan C. Connor
    Doris Damian
    Lei Zhu
    Amir Handzel
    Guanghui Hu
    Arshad Amanullah
    Steve Bao
    Nathaniel Woody
    David MacLean
    Kwan Lee
    Dana Vanderwall
    Terence Ryan
    Metabolomics, 2008, 4