A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform

被引:83
作者
Zhuang, Joanna [1 ,2 ]
Widschwendter, Martin [2 ]
Teschendorff, Andrew E. [1 ]
机构
[1] UCL, UCL Canc Inst, Stat Genom Grp, London WC1E 6BT, England
[2] UCL, UCL Elizabeth Garrett Anderson Inst Womens Hlth, Dept Womens Canc, London WC1E 6AU, England
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
DNA methylation; Classification; Feature selection; Beadarrays; SINGULAR-VALUE DECOMPOSITION; NONNEGATIVE MATRIX FACTORIZATION; GENE-EXPRESSION; STEM-CELLS; MICROARRAY; VALIDATION; WIDESPREAD; PREDICTION; DISCOVERY; ALGORITHM;
D O I
10.1186/1471-2105-13-59
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technology that allows the methylation state of over 27,000 CpGs to be assayed. While feature selection and classification methods have been comprehensively explored in the context of gene expression data, relatively little is known as to how best to perform feature selection or classification in the context of Illumina Infinium methylation data. Given the rising importance of epigenomics in cancer and other complex genetic diseases, and in view of the upcoming epigenome wide association studies, it is critical to identify the statistical methods that offer improved inference in this novel context. Results: Using a total of 7 large Illumina Infinium 27k Methylation data sets, encompassing over 1,000 samples from a wide range of tissues, we here provide an evaluation of popular feature selection, dimensional reduction and classification methods on DNA methylation data. Specifically, we evaluate the effects of variance filtering, supervised principal components (SPCA) and the choice of DNA methylation quantification measure on downstream statistical inference. We show that for relatively large sample sizes feature selection using test statistics is similar for M and beta-values, but that in the limit of small sample sizes, M-values allow more reliable identification of true positives. We also show that the effect of variance filtering on feature selection is study-specific and dependent on the phenotype of interest and tissue type profiled. Specifically, we find that variance filtering improves the detection of true positives in studies with large effect sizes, but that it may lead to worse performance in studies with smaller yet significant effect sizes. In contrast, supervised principal components improves the statistical power, especially in studies with small effect sizes. We also demonstrate that classification using the Elastic Net and Support Vector Machine (SVM) clearly outperforms competing methods like LASSO and SPCA. Finally, in unsupervised modelling of cancer diagnosis, we find that non-negative matrix factorisation (NMF) clearly outperforms principal components analysis. Conclusions: Our results highlight the importance of tailoring the feature selection and classification methodology to the sample size and biological context of the DNA methylation study. The Elastic Net emerges as a powerful classification algorithm for large-scale DNA methylation studies, while NMF does well in the unsupervised context. The insights presented here will be useful to any study embarking on large-scale DNA methylation profiling using Illumina Infinium beadarrays.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] A framework for analyzing DNA methylation data from Illumina Infinium HumanMethylation450 BeadChip
    Zhenxing Wang
    XiaoLiang Wu
    Yadong Wang
    [J]. BMC Bioinformatics, 19
  • [22] Methods for pattern selection, class-specific feature selection and classification for automated learning
    Roy, Asim
    Mackin, Patrick D.
    Mukhopadhyay, Somnath
    [J]. NEURAL NETWORKS, 2013, 41 : 113 - 129
  • [23] Genome-wide DNA methylation profiling using Infinium® assay
    Bibikova, Marina
    Le, Jennie
    Barnes, Bret
    Saedinia-Melnyk, Shadi
    Zhou, Lixin
    Shen, Richard
    Gunderson, Kevin L.
    [J]. EPIGENOMICS, 2009, 1 (01) : 177 - 200
  • [24] An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform
    Marabita, Francesco
    Almgren, Malin
    Lindholm, Malene E.
    Ruhrmann, Sabrina
    Fagerstrom-Billai, Fredrik
    Jagodic, Maja
    Sundberg, Carl J.
    Ekstrom, Tomas J.
    Teschendorff, Andrew E.
    Tegner, Jesper
    Gomez-Cabrero, David
    [J]. EPIGENETICS, 2013, 8 (03) : 333 - 346
  • [25] Using Individual Feature Evaluation to Start Feature Subset Selection Methods for Classification
    Arauzo-Azofra, Antonio
    Molina-Baena, Jose
    Jimenez-Vilchez, Alfonso
    Luque-Rodriguez, Maria
    [J]. ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 607 - 614
  • [26] Comparison of feature selection methods in Kurdish text classification
    Ari M. Saeed
    Soran Badawi
    Sara A. Ahmed
    Diyari A. Hassan
    [J]. Iran Journal of Computer Science, 2024, 7 (1) : 55 - 64
  • [27] A Performance Comparison of Feature Selection Methods for Sentiment Classification
    Hung, Lai Po
    Alfred, Rayner
    Hijazi, Mohd Hanafi Ahmad
    [J]. COMPUTATIONAL SCIENCE AND TECHNOLOGY, ICCST 2017, 2018, 488 : 21 - 30
  • [28] Group-shrinkage feature selection with a spatial network for mining DNA methylation data
    Tang, Xinlu
    Mo, Zhanfeng
    Chang, Cheng
    Qian, Xiaohua
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 154
  • [29] A comparison of performance of K-complex classification methods using feature selection
    Hernandez-Pereira, Elena
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alvarez-Estevez, Diego
    Moret-Bonillo, Vicente
    Alonso-Betanzos, Amparo
    [J]. INFORMATION SCIENCES, 2016, 328 : 1 - 14
  • [30] Impact of SNPs on methylation readouts by Illumina Infinium HumanMethylation450 BeadChip Array: implications for comparative population studies
    Daca-Roszak, Patrycja
    Pfeifer, Aleksandra
    Zebracka-Gala, Jadwiga
    Rusinek, Dagmara
    Szybinska, Aleksandra
    Jarzab, Barbara
    Witt, Michal
    Zietkiewicz, Ewa
    [J]. BMC GENOMICS, 2015, 16