MaskedPainter: Feature selection for microarray data analysis

被引:11
作者
Apiletti, Daniele [1 ]
Baralis, Elena [1 ]
Bruno, Giulia [1 ]
Fiori, Alessandro [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Feature selection; microarray analysis; tumor classification; data mining; GENE SELECTION; COLON-CANCER; CLASSIFICATION; EXPRESSION; PREDICTION;
D O I
10.3233/IDA-2012-0546
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Selecting a small number of discriminative genes from thousands is a fundamental task in microarray data analysis. An effective feature selection allows biologists to investigate only a subset of genes instead of the entire set, thus avoiding insignificant, noisy, and redundant features. This paper presents the Masked Painter feature selection method for gene expression data. The proposed method measures the ability of each gene to classify samples belonging to different classes and ranks genes by computing an overlap score. A density based technique is exploited to smooth the effects of outliers in the overlap score computation. Analogously to other approaches, the number of selected genes can be set by the user. However, our algorithm may automatically detect the minimum set of genes that yields the best classification coverage of training set samples. The effectiveness of our approach has been demonstrated through an empirical study on public microarray datasets with different characteristics. Experimental results show that the proposed approach yields a higher classification accuracy with respect to widely used feature selection techniques.
引用
收藏
页码:717 / 737
页数:21
相关论文
共 55 条
  • [1] Colon cancer prediction with genetic profiles using intelligent techniques
    Alladi, Subha Mahadevi
    Santosh, Shinde P.
    Ravi, Vadlamani
    Murthy, Upadhyayula Suryanarayana
    [J]. BIOINFORMATION, 2008, 3 (03) : 130 - 133
  • [2] Alon U., 1999, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
  • [3] The painter's feature selection for gene expression data
    Apiletti, Daniele
    Baralis, Elena
    Bruno, Giulia
    Fiori, Alessandro
    [J]. 2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 4227 - 4230
  • [4] Attribute clustering for grouping, selection, and classification of gene expression data
    Au, WH
    Chan, KCC
    Wong, AKC
    Wang, Y
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (02) : 83 - 101
  • [5] Minimum Number of Genes for Microarray Feature Selection
    Baralis, Elena
    Bruno, Giulia
    Fiori, Alessandro
    [J]. 2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, : 5692 - 5695
  • [6] Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters
    Bertucci, F
    Salas, S
    Eysteries, S
    Nasser, V
    Finetti, P
    Ginestier, C
    Charafe-Jauffret, E
    Loriod, B
    Bachelart, L
    Montfort, J
    Victorero, G
    Viret, F
    Ollendorff, V
    Fert, V
    Giovaninni, M
    Delpero, JR
    Nguyen, C
    Viens, P
    Monges, G
    Birnbaum, D
    Houlgatte, R
    [J]. ONCOGENE, 2004, 23 (07) : 1377 - 1391
  • [7] Gene selection with multiple ordering criteria
    Chen, James J.
    Tsai, Chen-An
    Tzeng, ShengLi
    Chen, Chun-Houh
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [8] THE IDENTIFICATION OF MULTIPLE OUTLIERS
    DAVIES, L
    GATHER, U
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) : 782 - 792
  • [9] DAVID: Database for annotation, visualization, and integrated discovery
    Dennis, G
    Sherman, BT
    Hosack, DA
    Yang, J
    Gao, W
    Lane, HC
    Lempicki, RA
    [J]. GENOME BIOLOGY, 2003, 4 (09)
  • [10] Gene selection and classification of microarray data using random forest -: art. no. 3
    Díaz-Uriarte, R
    de Andrés, SA
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)