Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

被引：1

作者：

Li, Ying ^{[1
]}

Wang, Nan ^{[1
]}

Zhang, Chaoyang ^{[1
]}

Perkins, Edward J. ^{[2
]}

Gong, Ping ^{[3
]}

机构：

[1] Univ So Mississippi, Hattiesburg, MS 39401 USA

[2] US Army Engn Res & Dev Ctr, Vicksburg, MS 39180 USA

[3] SpecPro Inc, Vicksburg, MS 39180 USA

来源：

2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年

关键词：

Biomarker; Classification; Decision tree; Support vector machine; Clustering; Earthworm Microarray; CANCER CLASSIFICATION;

D O I：

10.1109/IJCBS.2009.134

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3.5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.

引用

页码：23 / +

页数：2

共 50 条

[41] Integrated Analysis of Gene Expression Data for Colon Cancer Biomarker Discovery
Hassan, Aamir
Zaka, Masood U. H.
Kouvatsos, Demetres
Peng, Yonghong
2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 536 - 541
[42] Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm
Alshawaqfeh, Mustafa
Bashaireh, Ahmad
Serpedin, Erchin
Suchodolski, Jan
BMC BIOINFORMATICS, 2017, 18
[43] The local maximum clustering method and its application in microarray gene expression data analysis
Wu, XW
Chen, YD
Brooks, BR
Su, YA
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (01) : 53 - 63
[44] The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis
Xiongwu Wu
Yidong Chen
Bernard R Brooks
Yan A Su
EURASIP Journal on Advances in Signal Processing, 2004
[45] PBC: A Software Framework Facilitating Pattern-Based Clustering for Microarray Data Analysis
Shin, Dong-Guk
Hong, Seung-Hyun
Joshi, Pujan
Nori, Ravi
Pei, Baikang
Wang, Hsin-Wei
Harrington, Patrick
Kuo, Lynn
Kalajzic, Ivo
Rowe, David
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 30 - +
[46] Integrated analysis of microarray data to identify the genes critical for the rupture of intracranial aneurysm
Wei, Liang
Wang, Qi
Zhang, Yanfei
Yang, Cheng
Guan, Hongxin
Jiang, Jianxin
Sun, Zhiyang
ONCOLOGY LETTERS, 2018, 15 (04) : 4951 - 4957
[47] Recursive Consensus Clustering for novel subtype discovery from transcriptome data
Sonpatki, Pranali
Shah, Nameeta
SCIENTIFIC REPORTS, 2020, 10 (01)
[48] Election Data Analysis From Clustering Welfare Data
Zhan, Tiffany
2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 475 - 479
[49] Challenges from clustering analysis to knowledge discovery in molecular biomechanics
Ping, L. W. (meloh@eng.usm.my), 1600, Bentham Science Publishers (07): : 333 - 339
[50] FlowCT for the analysis of large immunophenotypic data sets and biomarker discovery in cancer immunology
Botta, Cirino
Maia, Catarina
Garces, Juan-Jose
Termini, Rosalinda
Perez, Cristina
Manrique, Irene
Burgos, Leire
Zabaleta, Aintzane
Alignani, Diego
Sarvide, Sarai
Merino, Juana
Puig, Noemi
Cedena, Maria-Teresa
Rossi, Marco
Tassone, Pierfrancesco
Gentile, Massimo
Correale, Pierpaolo
Borrello, Ivan
Terpos, Evangelos
Jelinek, Tomas
Paiva, Artur
Roccaro, Aldo
Goldschmidt, Hartmut
Avet-Loiseau, Herve
Rosinol, Laura
Mateos, Maria-Victoria
Martinez-Lopez, Joaquin
Lahuerta, Juan-Jose
Blade, Joan
San-Miguel, Jesus F.
Paiva, Bruno
BLOOD ADVANCES, 2022, 6 (02) : 690 - 703

← 1 2 3 4 5 →