Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data

被引:8
|
作者
Kim, Kipoong [1 ]
Sun, Hokeun [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, Busan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
DNA methylation; Genetic network; Regularization; Dimension reduction; BREAST-CANCER; VARIABLE SELECTION; COMPONENT ANALYSIS; EXPRESSION DATA; REGRESSION; REGULARIZATION; VARIANTS; PROTEIN; LUNG; PREDICTION;
D O I
10.1186/s12859-019-3040-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other. Results We propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. Conclusions The proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data
    Kipoong Kim
    Hokeun Sun
    BMC Bioinformatics, 20
  • [2] Incorporating Genetic Networks into Case-Control Association Studies with High-Dimensional DNA Methylation Data
    Kim, Kipoong
    Sun, Hokeun
    GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 661 - 662
  • [3] Penalized logistic regression for high-dimensional DNA methylation data with case-control studies
    Sun, Hokeun
    Wang, Shuang
    BIOINFORMATICS, 2012, 28 (10) : 1368 - 1375
  • [4] Gene selection by incorporating genetic networks into case-control association studies
    Xuewei Cao
    Xiaoyu Liang
    Shuanglin Zhang
    Qiuying Sha
    European Journal of Human Genetics, 2024, 32 : 270 - 277
  • [5] Gene selection by incorporating genetic networks into case-control association studies
    Cao, Xuewei
    Liang, Xiaoyu
    Zhang, Shuanglin
    Sha, Qiuying
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 (03) : 270 - 277
  • [6] Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data
    Sun, Hokeun
    Wang, Shuang
    STATISTICS IN MEDICINE, 2013, 32 (12) : 2127 - 2139
  • [7] Data quality control in genetic case-control association studies
    Anderson, Carl A.
    Pettersson, Fredrik H.
    Clarke, Geraldine M.
    Cardon, Lon R.
    Morris, Andrew P.
    Zondervan, Krina T.
    NATURE PROTOCOLS, 2010, 5 (09) : 1564 - 1573
  • [8] Data quality control in genetic case-control association studies
    Carl A Anderson
    Fredrik H Pettersson
    Geraldine M Clarke
    Lon R Cardon
    Andrew P Morris
    Krina T Zondervan
    Nature Protocols, 2010, 5 : 1564 - 1573
  • [9] Evaluation of Public Control Data and Case-control Ratios for Genetic Association Studies
    Adrianto, Indra
    Lessard, Christopher J.
    Adler, Adam
    Kaufman, Kenneth M.
    Moser, Kathy L.
    Gray-McGuire, Courtney
    GENETIC EPIDEMIOLOGY, 2010, 34 (08) : 943 - 944
  • [10] Matched Forest: supervised learning for high-dimensional matched case-control studies
    Zadeh, Nooshin Shomal
    Lin, Sangdi
    Runger, George C.
    BIOINFORMATICS, 2020, 36 (05) : 1570 - 1576