Induction of comprehensible models for gene expression datasets by subgroup discovery methodology

被引:31
作者
Gamberger, D
Lavrac, N
Zelezny, F
Tolar, J
机构
[1] Rudjer Boskovic Inst, Informat Syst Lab, Zagreb 10000, Croatia
[2] Jozef Stefan Inst, Dept Knowledge Technol, Zagreb 1000, Croatia
[3] Nova Gorica Polytech Vipavska 13, Nova Gorica 5000, Slovenia
[4] Czech Tech Univ, FEL, Dept Cybernet, Prague 16627, Czech Republic
[5] Univ Wisconsin, Sch Med, Dept Biostat, Madison, WI 53706 USA
[6] Univ Minnesota, Sch Med, Inst Human Genet, Minneapolis, MN 55455 USA
关键词
gene expression measurements; disease markers; subgroup discovery; machine learning; comprehensible classification;
D O I
10.1016/j.jbi.2004.07.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:269 / 284
页数:16
相关论文
共 59 条
[1]   Analysis of the PI-3-kinase-PTEN-AKT pathway in human lymphoma and leukemia using a cell line microarray [J].
Abbott, RT ;
Tripp, S ;
Perkins, SL ;
Elenitoba-Johnson, KSJ ;
Lim, MS .
MODERN PATHOLOGY, 2003, 16 (06) :607-612
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]   Serum levels of interleukin-18 are increased in patients with cutaneous T-cell lymphoma and cutaneous natural killer-cell lymphoma [J].
Amo, Y ;
Ohta, Y ;
Hamada, Y ;
Katsuoka, K .
BRITISH JOURNAL OF DERMATOLOGY, 2001, 145 (04) :674-676
[4]   Role of leptin on alcohol-induced oxidative stress in Swiss mice [J].
Balasubramaniyan, V ;
Sailaja, JK ;
Nalini, N .
PHARMACOLOGICAL RESEARCH, 2003, 47 (03) :211-216
[5]   PRODUCTION AND INTERFERON-GAMMA-MEDIATED REGULATION OF COMPLEMENT COMPONENT C2 AND FACTOR-B AND FACTOR-D BY THE ASTROGLIOMA CELL-LINE U105-MG [J].
BARNUM, SR ;
ISHII, Y ;
AGRAWAL, A ;
VOLANAKIS, JE .
BIOCHEMICAL JOURNAL, 1992, 287 :595-601
[6]  
BARNUM SR, 1985, J IMMUNOL, V134, P1799
[7]   BIOSYNTHESIS OF COMPLEMENT PROTEIN-D BY HEPG2 CELLS - A COMPARISON OF D PRODUCED BY HEPG2 CELLS, U937 CELLS AND BLOOD MONOCYTES [J].
BARNUM, SR ;
VOLANAKIS, JE .
EUROPEAN JOURNAL OF IMMUNOLOGY, 1985, 15 (11) :1148-1151
[8]   An ins(X;11)(q24;q23) fuses the MLL and the septin 6/KIAA0128 gene in an infant with AML-M2 [J].
Borkhardt, A ;
Teigler-Schlegel, A ;
Fuchs, U ;
Keller, C ;
König, M ;
Harbott, J ;
Haas, OA .
GENES CHROMOSOMES & CANCER, 2001, 32 (01) :82-88
[9]   Identifying marker genes in transcription profiling data using a mixture of feature relevance experts [J].
Chow, ML ;
Moler, EJ ;
Mian, IS .
PHYSIOLOGICAL GENOMICS, 2001, 5 (02) :99-111
[10]   Critical review of acylation-stimulating protein physiology in humans and rodents [J].
Cianflone, K ;
Xia, ZN ;
Chen, LY .
BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES, 2003, 1609 (02) :127-143