Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies

被引:468
作者
Jaffe, Andrew E. [1 ,2 ,3 ]
Murakami, Peter [3 ]
Lee, Hwajin [3 ]
Leek, Jeffrey T. [1 ]
Fallin, M. Daniele [1 ,2 ,3 ,4 ]
Feinberg, Andrew P. [1 ,3 ,4 ]
Irizarry, Rafael A. [1 ,3 ]
机构
[1] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Epidemiol, Baltimore, MD 21205 USA
[3] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
关键词
Epigenetic epidemiology; DNA methylation; genome-wide analysis; bump hunting; batch effects; SURROGATE VARIABLE ANALYSIS; PLURIPOTENT STEM-CELLS; CPG ISLAND SHORES; DNA METHYLATION; GENE-EXPRESSION; CANCER; DISEASE; ARRAYS; MODE; CHIP;
D O I
10.1093/ije/dyr238
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background During the past 5 years, high-throughput technologies have been successfully used by epidemiology studies, but almost all have focused on sequence variation through genome-wide association studies (GWAS). Today, the study of other genomic events is becoming more common in large-scale epidemiological studies. Many of these, unlike the single-nucleotide polymorphism studied in GWAS, are continuous measures. In this context, the exercise of searching for regions of interest for disease is akin to the problems described in the statistical 'bump hunting' literature. Methods New statistical challenges arise when the measurements are continuous rather than categorical, when they are measured with uncertainty, and when both biological signal, and measurement errors are characterized by spatial correlation along the genome. Perhaps the most challenging complication is that continuous genomic data from large studies are measured throughout long periods, making them susceptible to ` batch effects'. An example that combines all three characteristics is genome-wide DNA methylation measurements. Here, we present a data analysis pipeline that effectively models measurement error, removes batch effects, detects regions of interest and attaches statistical uncertainty to identified regions. Results We illustrate the usefulness of our approach by detecting genomic regions of DNA methylation associated with a continuous trait in a well-characterized population of newborns. Additionally, we show that addressing unexplained heterogeneity like batch effects reduces the number of false-positive regions. Conclusions Our framework offers a comprehensive yet flexible approach for identifying genomic regions of biological interest in large epidemiological studies using quantitative high-throughput methods.
引用
收藏
页码:200 / 209
页数:10
相关论文
共 45 条
  • [1] Determinants of fetal exposure to polyfluoroalkyl compounds in Baltimore, Maryland
    Apelberg, Benjamin J.
    Goldman, Lynn R.
    Calafat, Antonia M.
    Herbstman, Julie B.
    Kuklenyik, Zsuzsanna
    Heidler, Jochen
    Needham, Larry L.
    Halden, Rolf U.
    Witter, Frank R.
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2007, 41 (11) : 3891 - 3897
  • [2] A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization
    Arking, Dan E.
    Pfeufer, Arne
    Post, Wendy
    Kao, W. H. Linda
    Newton-Cheh, Christopher
    Ikeda, Morna
    West, Kristen
    Kashuk, Carl
    Akyol, Mahmut
    Perz, Siegfried
    Jalilzadeh, Shapour
    Illig, Thomas
    Gieger, Christian
    Guo, Chao-Yu
    Larson, Martin G.
    Wichmann, H. Erich
    Marban, Eduardo
    O'Donnell, Christopher J.
    Hirschhorn, Joel N.
    Kaeaeb, Stefan
    Spooner, Peter M.
    Meitinger, Thomas
    Chakravarti, Aravinda
    [J]. NATURE GENETICS, 2006, 38 (06) : 644 - 651
  • [3] Accurate genome-scale percentage DNA methylation estimates from microarray data
    Aryee, Martin J.
    Wu, Zhijin
    Ladd-Acosta, Christine
    Herb, Brian
    Feinberg, Andrew P.
    Yegnasubramanian, Srinivasan
    Irizarry, Rafael A.
    [J]. BIOSTATISTICS, 2011, 12 (02) : 197 - 210
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] High density DNA methylation array with single CpG site resolution
    Bibikova, Marina
    Barnes, Bret
    Tsan, Chan
    Ho, Vincent
    Klotzle, Brandy
    Le, Jennie M.
    Delano, David
    Zhang, Lu
    Schroth, Gary P.
    Gunderson, Kevin L.
    Fan, Jian-Bing
    Shen, Richard
    [J]. GENOMICS, 2011, 98 (04) : 288 - 295
  • [6] DNA METHYLATION INHIBITS TRANSCRIPTION INDIRECTLY VIA A METHYL-CPG BINDING-PROTEIN
    BOYES, J
    BIRD, A
    [J]. CELL, 1991, 64 (06) : 1123 - 1134
  • [7] ROBUST LOCALLY WEIGHTED REGRESSION AND SMOOTHING SCATTERPLOTS
    CLEVELAND, WS
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1979, 74 (368) : 829 - 836
  • [8] Cloud J., 2010, TIME Magazine
  • [9] Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts
    Doi, Akiko
    Park, In-Hyun
    Wen, Bo
    Murakami, Peter
    Aryee, Martin J.
    Irizarry, Rafael
    Herb, Brian
    Ladd-Acosta, Christine
    Rho, Junsung
    Loewer, Sabine
    Miller, Justine
    Schlaeger, Thorsten
    Daley, George Q.
    Feinberg, Andrew P.
    [J]. NATURE GENETICS, 2009, 41 (12) : 1350 - U123
  • [10] DNA methylation profiling of human chromosomes 6, 20 and 22
    Eckhardt, Florian
    Lewin, Joern
    Cortese, Rene
    Rakyan, Vardhman K.
    Attwood, John
    Burger, Matthias
    Burton, John
    Cox, Tony V.
    Davies, Rob
    Down, Thomas A.
    Haefliger, Carolina
    Horton, Roger
    Howe, Kevin
    Jackson, David K.
    Kunde, Jan
    Koenig, Christoph
    Liddle, Jennifer
    Niblett, David
    Otto, Thomas
    Pettett, Roger
    Seemann, Stefanie
    Thompson, Christian
    West, Tony
    Rogers, Jane
    Olek, Alex
    Berlin, Kurt
    Beck, Stephan
    [J]. NATURE GENETICS, 2006, 38 (12) : 1378 - 1385