A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform

被引:83
|
作者
Zhuang, Joanna [1 ,2 ]
Widschwendter, Martin [2 ]
Teschendorff, Andrew E. [1 ]
机构
[1] UCL, UCL Canc Inst, Stat Genom Grp, London WC1E 6BT, England
[2] UCL, UCL Elizabeth Garrett Anderson Inst Womens Hlth, Dept Womens Canc, London WC1E 6AU, England
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
DNA methylation; Classification; Feature selection; Beadarrays; SINGULAR-VALUE DECOMPOSITION; NONNEGATIVE MATRIX FACTORIZATION; GENE-EXPRESSION; STEM-CELLS; MICROARRAY; VALIDATION; WIDESPREAD; PREDICTION; DISCOVERY; ALGORITHM;
D O I
10.1186/1471-2105-13-59
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The 27k Illumina Infinium Methylation Beadchip is a popular high-throughput technology that allows the methylation state of over 27,000 CpGs to be assayed. While feature selection and classification methods have been comprehensively explored in the context of gene expression data, relatively little is known as to how best to perform feature selection or classification in the context of Illumina Infinium methylation data. Given the rising importance of epigenomics in cancer and other complex genetic diseases, and in view of the upcoming epigenome wide association studies, it is critical to identify the statistical methods that offer improved inference in this novel context. Results: Using a total of 7 large Illumina Infinium 27k Methylation data sets, encompassing over 1,000 samples from a wide range of tissues, we here provide an evaluation of popular feature selection, dimensional reduction and classification methods on DNA methylation data. Specifically, we evaluate the effects of variance filtering, supervised principal components (SPCA) and the choice of DNA methylation quantification measure on downstream statistical inference. We show that for relatively large sample sizes feature selection using test statistics is similar for M and beta-values, but that in the limit of small sample sizes, M-values allow more reliable identification of true positives. We also show that the effect of variance filtering on feature selection is study-specific and dependent on the phenotype of interest and tissue type profiled. Specifically, we find that variance filtering improves the detection of true positives in studies with large effect sizes, but that it may lead to worse performance in studies with smaller yet significant effect sizes. In contrast, supervised principal components improves the statistical power, especially in studies with small effect sizes. We also demonstrate that classification using the Elastic Net and Support Vector Machine (SVM) clearly outperforms competing methods like LASSO and SPCA. Finally, in unsupervised modelling of cancer diagnosis, we find that non-negative matrix factorisation (NMF) clearly outperforms principal components analysis. Conclusions: Our results highlight the importance of tailoring the feature selection and classification methodology to the sample size and biological context of the DNA methylation study. The Elastic Net emerges as a powerful classification algorithm for large-scale DNA methylation studies, while NMF does well in the unsupervised context. The insights presented here will be useful to any study embarking on large-scale DNA methylation profiling using Illumina Infinium beadarrays.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform
    Joanna Zhuang
    Martin Widschwendter
    Andrew E Teschendorff
    BMC Bioinformatics, 13
  • [2] Batch effect correction for genome-wide methylation data with Illumina Infinium platform
    Sun, Zhifu
    Chai, High Seng
    Wu, Yanhong
    White, Wendy M.
    Donkena, Krishna V.
    Klein, Christopher J.
    Garovic, Vesna D.
    Therneau, Terry M.
    Kocher, Jean-Pierre A.
    BMC MEDICAL GENOMICS, 2011, 4
  • [3] A VARIATIONAL BAYES BETA MIXTURE MODEL FOR FEATURE SELECTION IN DNA METHYLATION STUDIES
    Ma, Zhanyu
    Teschendorff, Andrew E.
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2013, 11 (04)
  • [4] Characterization of parent-of-origin methylation using the Illumina Infinium MethylationEPIC array platform
    Hernandez Mora, Jose R.
    Tayama, Chiharu
    Sanchez-Delgado, Marta
    Monteagudo-Sanchez, Ana
    Hata, Kenichiro
    Ogata, Tsutomu
    Medrano, Jose
    Poo-Llanillo, Maria E.
    Simon, Carlos
    Moran, Sebastian
    Esteller, Manel
    Tenorio, Jair
    Lapunzina, Pablo
    Kagami, Masayo
    Monk, David
    Nakabayashi, Kazuhiko
    EPIGENOMICS, 2018, 10 (07) : 941 - 954
  • [5] Comparison of Methylation Density in Different Cancer Types by Illumina Infinium HumanMethylation450 Methods
    Dogan, Senol
    Sahin, Hakan
    PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, 2013, : 405 - +
  • [6] Low-level processing of Illumina Infinium DNA Methylation BeadArrays
    Triche, Timothy J., Jr.
    Weisenberger, Daniel J.
    Van Den Berg, David
    Laird, Peter W.
    Siegmund, Kimberly D.
    NUCLEIC ACIDS RESEARCH, 2013, 41 (07) : e90
  • [7] Genome-wide methylation profiling of early colorectal cancer using an Illumina Infinium Methylation EPIC BeadChip
    Wu, Yu-Ling
    Jiang, Tao
    Huang, Wei
    Wu, Xing-Yu
    Zhang, Peng-Jun
    Tian, Ya-Ping
    WORLD JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2022, 14 (04) : 935 - 946
  • [8] Comparative Study of Feature Selection and Classification Techniques for High-Throughput DNA Methylation Data
    Alkuhlani, Alhasan
    Nassef, Mohammad
    Farag, Ibrahim
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 793 - 803
  • [9] An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods
    Dass, Sayantan
    Mistry, Sujoy
    Sarkar, Pradyut
    Paik, Pradip
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2021, 2022, 1534 : 751 - 764
  • [10] DNA methylome profiling in occupational radon exposure miners using an Illumina Infinium Methylation EPIC BeadChip
    Zhang, Pinhua
    Wu, Yunyun
    Piao, Chunnan
    Song, Yanchao
    Zhao, Yanfang
    Lyu, Yumin
    Sun, Quanfu
    Liu, Jianxiang
    TOXICOLOGY RESEARCH, 2023, 12 (05) : 943 - 953