Finding rule groups to classify high dimensional gene expression datasets

被引:9
作者
An, Jiyuan [1 ]
Chen, Yi-Ping Phoebe [1 ]
机构
[1] Deakin Univ, Fac Sci & Technol, Melbourne, Vic 3125, Australia
基金
澳大利亚研究理事会;
关键词
Gene expression datasets; Microarray data analysis; Classification; CLASSIFICATION; PREDICTION; CANCER;
D O I
10.1016/j.compbiolchem.2008.07.031
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microarray data provides quantitative information about the transcription profile of cells. To analyze microarray datasets, methodology of machine learning has increasingly attracted bioinformatics researchers. Some approaches of machine learning are widely used to classify and minebiological datasets. However, many gene expression datasets are extremely high dimensionality, traditional machine learning methods cannot be applied effectively and efficiently. This paper proposes a robust algorithm to find out rule groups to classify gene expression datasets. Unlike the most classification algorithms, which select dimensions (genes) heuristically to form rules groups to identify classes such as cancerous and normal tissues, our algorithm guarantees finding out best-k dimensions (genes) to form rule groups for the classification of expression datasets. Our experiments show that the rule groups obtained by our algorithm have higher accuracy than that of other classification approaches. (c) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:108 / 113
页数:6
相关论文
共 22 条
  • [1] Microarray data analysis: from disarray to consolidation and consensus
    Allison, DB
    Cui, XQ
    Page, GP
    Sabripour, M
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (01) : 55 - 65
  • [2] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [3] An JY, 2005, LECT NOTES ARTIF INT, V3682, P37
  • [4] DDR: an index method for large time-series datasets
    An, JY
    Chen, YPP
    Chen, HX
    [J]. INFORMATION SYSTEMS, 2005, 30 (05) : 333 - 348
  • [5] Gene expression profile classification: A review
    Asyali, Musa H.
    Colak, Dilek
    Demirkaya, Omer
    Inan, Mehmet S.
    [J]. CURRENT BIOINFORMATICS, 2006, 1 (01) : 55 - 73
  • [6] BAYARDO RJ, 1998, 17 ACM SIGMOD INT C, P85
  • [7] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [8] An explicit solution for thermal calculation and synthesis of superstructure heat exchanger networks
    Chen Dezhen
    Yang Shanshan
    Luo Xing
    Wen Qingyun
    Ma Hugen
    [J]. CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2007, 15 (02) : 296 - 301
  • [9] CLARK R, 1991, MACH LEARN EWSL, V91, P151
  • [10] FAYYAD UM, 1993, IJCAI-93, VOLS 1 AND 2, P1022