Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

被引：1

作者：

Li, Ying ^{[1
]}

Wang, Nan ^{[1
]}

Zhang, Chaoyang ^{[1
]}

Perkins, Edward J. ^{[2
]}

Gong, Ping ^{[3
]}

机构：

[1] Univ So Mississippi, Hattiesburg, MS 39401 USA

[2] US Army Engn Res & Dev Ctr, Vicksburg, MS 39180 USA

[3] SpecPro Inc, Vicksburg, MS 39180 USA

来源：

2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年

关键词：

Biomarker; Classification; Decision tree; Support vector machine; Clustering; Earthworm Microarray; CANCER CLASSIFICATION;

D O I：

10.1109/IJCBS.2009.134

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3.5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.

引用

页码：23 / +

页数：2

共 50 条

[1] ILRC: a hybrid biomarker discovery algorithm based on improved L1 regularization and clustering in microarray data
Yu, Kun
Xie, Weidong
Wang, Linjie
Li, Wei
BMC BIOINFORMATICS, 2021, 22 (01)
[2] Combined Clustering Methods for Microarray Data Analysis
Malutan, Raul
Gomez Vilda, Pedro
Borda, Monica
INTERDISCIPLINARY RESEARCH IN ENGINEERING: STEPS TOWARDS BREAKTHROUGH INNOVATION FOR SUSTAINABLE DEVELOPMENT, 2013, 8-9 : 508 - +
[3] ILRC: a hybrid biomarker discovery algorithm based on improved L1 regularization and clustering in microarray data
Kun Yu
Weidong Xie
Linjie Wang
Wei Li
BMC Bioinformatics, 22
[4] Common Subcluster Mining in Microarray Data for Molecular Biomarker Discovery
Sadhu, Arnab
Bhattacharyya, Balaram
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2019, 11 (03) : 348 - 359
[5] A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data for Cancer Classification
Peng, Yanxiong
Li, Wenyuan
Liu, Ying
CANCER INFORMATICS, 2006, 2 : 301 - 311
[6] Proteomic profile analysis and biomarker discovery from mass spectra using independent component analysis combined with uncorrelated linear discriminant analysis
Zhang, Mingjin
Tong, Peijin
Wang, Wenming
Geng, Jinpei
Du, Yiping
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 105 (02) : 207 - 214
[7] Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM
Mirzal, Andri
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 1173 - 1192
[8] Determining Potential Yeast Longevity Genes via PPI Networks and Microarray Data Clustering Analysis
Chen, Bernard
Doolabh, Roshan
Tang, Fusheng
2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 370 - 373
[9] A two-stage gene selection method for biomarker discovery from microarray data for cancer classification
Shukla, Alok Kumar
Singh, Pradeep
Vardhan, Manu
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 183 : 47 - 58
[10] Model-based clustering, classification, and discriminant analysis of data with mixed type
Browne, Ryan P.
McNicholas, Paul D.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (11) : 2976 - 2984

← 1 2 3 4 5 →