The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

被引:53
作者
Beesley, Lauren J. [1 ]
Salvatore, Maxwell [1 ]
Fritsche, Lars G. [1 ]
Pandit, Anita [1 ]
Rao, Arvind [2 ]
Brummett, Chad [3 ]
Willer, Cristen J. [2 ]
Lisabeth, Lynda D. [4 ]
Mukherjee, Bhramar [1 ]
机构
[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Anesthesiol, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Dept Epidemiol, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
biobanks; electronic health records; Michigan Genomics Initiative; UK Biobank; selection bias; MULTIPLE-TESTING CORRECTION; PHENOME-WIDE ASSOCIATION; CONTROLLED CASE SERIES; BIG DATA; MENDELIAN RANDOMIZATION; GENETIC ASSOCIATION; CAUSAL INFERENCE; POLYGENIC SCORES; DISEASE; HETEROGENEITY;
D O I
10.1002/sim.8445
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
引用
收藏
页码:773 / 800
页数:28
相关论文
共 219 条
  • [1] IL-6 variant is associated with metastasis in breast cancer patients
    Abana, Chike O.
    Bingham, Brian S.
    Cho, Ju Hwan
    Graves, Amy J.
    Koyama, Tatsuki
    Pilarski, Robert T.
    Chakravarthy, A. Bapsi
    Xia, Fen
    [J]. PLOS ONE, 2017, 12 (07):
  • [2] Identifying large sets of unrelated individuals and unrelated markers
    Abraham, Kuruvilla Joseph
    Diaz, Clara
    [J]. SOURCE CODE FOR BIOLOGY AND MEDICINE, 2014, 9 (01)
  • [3] ACZON M, 2017, NEURAL NETWORKS ARXI, P1
  • [4] Machine Learning and Electronic Health Records: A Paradigm Shift
    Adkins, Daniel E.
    [J]. AMERICAN JOURNAL OF PSYCHIATRY, 2017, 174 (02) : 93 - 94
  • [5] Biases in electronic health record data due to processes within the healthcare system: retrospective observational study
    Agniel, Denis
    Kohane, Isaac S.
    Weber, Griffin M.
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2018, 361
  • [6] The Qatar Biobank: background and methods
    Al Kuwari, Hanan
    Al Thani, Asma
    Al Marri, Ajayeb
    Al Kaabi, Abdulla
    Abderrahim, Hadi
    Afifi, Nahla
    Qafoud, Fatima
    Chan, Queenie
    Tzoulaki, Ioanna
    Downey, Paul
    Ward, Heather
    Murphy, Neil
    Riboli, Elio
    Elliott, Paul
    [J]. BMC PUBLIC HEALTH, 2015, 15
  • [7] Al-Azwani IK, 2016, QUAL PRIM CARE, V24, P151
  • [8] Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms
    Albers, D. J.
    Elhadad, N.
    Claassen, J.
    Perotte, R.
    Goldstein, A.
    Hripcsak, G.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 78 : 87 - 101
  • [9] UK Biobank: Current status and what it means for epidemiology
    Allen, Naomi
    Sudlow, Cathie
    Downey, Paul
    Peakman, Tim
    Danesh, John
    Elliott, Paul
    Gallacher, John
    Green, Jane
    Matthews, Paul
    Pell, Jill
    Sprosen, Tim
    Collins, Rory
    [J]. HEALTH POLICY AND TECHNOLOGY, 2012, 1 (03) : 123 - 126
  • [10] Are You Your Friends' Friend? Poor Perception of Friendship Ties Limits the Ability to Promote Behavioral Change
    Almaatouq, Abdullah
    Radaelli, Laura
    Pentland, Alex
    Shmueli, Erez
    [J]. PLOS ONE, 2016, 11 (03):