Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach

被引:44
作者
Rolland, Betsy [1 ,2 ]
Reid, Suzanna [2 ]
Stelling, Deanna [2 ]
Warnick, Greg [2 ]
Thornquist, Mark [2 ]
Feng, Ziding [3 ]
Potter, John D. [2 ,4 ,5 ]
机构
[1] NCI, Canc Prevent Fellowship Program, Bethesda, MD 20892 USA
[2] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Seattle, WA 98104 USA
[3] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
[4] Massey Univ, Ctr Publ Hlth Res, Wellington, New Zealand
[5] Univ Washington, Sch Publ Hlth, Dept Epidemiol, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
cancer epidemiology; data harmonization; data pooling; ASIA COHORT CONSORTIUM; BODY-MASS INDEX; DATASHAPER APPROACH; POOLED ANALYSIS; RISK; ASSOCIATION; DEATH;
D O I
10.1093/aje/kwv133
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Cancer epidemiologists have a long history of combining data sets in pooled analyses, often harmonizing heterogeneous data from multiple studies into 1 large data set. Although there are useful websites on data harmonization with recommendations and support, there is little research on best practices in data harmonization; each project conducts harmonization according to its own internal standards. The field would be greatly served by charting the process of data harmonization to enhance the quality of the harmonized data. Here, we describe the data harmonization process utilized at the Fred Hutchinson Cancer Research Center (Seattle, Washington) by the coordinating centers of several research projects. We describe a 6-step harmonization process, including: 1) identification of questions the harmonized data set is required to answer; 2) identification of high-level data concepts to answer those questions; 3) assessment of data availability for data concepts; 4) development of common data elements for each data concept; 5) mapping and transformation of individual data points to common data elements; and 6) quality-control procedures. Our aim here is not to claim a "correct" way of doing data harmonization but to encourage others to describe their processes in order that we can begin to create rigorous approaches. We also propose a research agenda around this issue.
引用
收藏
页码:1033 / 1038
页数:6
相关论文
共 50 条
  • [31] Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology
    Brody, Jennifer A.
    Morrison, Alanna C.
    Bis, Joshua C.
    O'Connell, Jeffrey R.
    Brown, Michael R.
    Huffman, Jennifer E.
    Ames, Darren C.
    Carroll, Andrew
    Conomos, Matthew P.
    Gabriel, Stacey
    Gibbs, Richard A.
    Gogarten, Stephanie M.
    Gupta, Namrata
    Jaquish, Cashell E.
    Johnson, Andrew D.
    Lewis, Joshua P.
    Liu, Xiaoming
    Manning, Alisa K.
    Papanicolaou, George J.
    Pitsillides, Achilleas N.
    Rice, Kenneth M.
    Salerno, William
    Sitlani, Colleen M.
    Smith, Nicholas L.
    Heckbert, Susan R.
    Laurie, Cathy C.
    Mitchell, Braxton D.
    Vasan, Ramachandran S.
    Rich, Stephen S.
    Rotter, Jerome I.
    Wilson, James G.
    Boerwinkle, Eric
    Psaty, Bruce M.
    Cupples, L. Adrienne
    NATURE GENETICS, 2017, 49 (11) : 1560 - 1563
  • [32] Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data
    Yan, Zhi
    Li, Jiangeng
    Xiong, Yimin
    Xu, Weitian
    Zheng, Guorong
    ONCOLOGY REPORTS, 2012, 28 (03) : 1036 - 1042
  • [33] Trends in cancer-related suicide in the United States: a population-based epidemiology study spanning 40 years of data
    Liu, Qiang
    Qu, Zheng
    Dong, Hao
    Qi, Yihang
    Wu, Juan
    Zhang, Wenxiang
    Wang, Xiangyu
    Wang, Zhongzhao
    Fang, Yi
    Wang, Jing
    TRANSLATIONAL PSYCHIATRY, 2024, 14 (01):
  • [34] Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: A Netherlands consortium of dementia cohorts case study
    Mateus, Pedro
    Moonen, Justine
    Beran, Magdalena
    Jaarsma, Eva
    van der Landen, Sophie M.
    Heuvelink, Joost
    Birhanu, Mahlet
    Harms, Alexander G. J.
    Bron, Esther
    Wolters, Frank J.
    Cats, Davy
    Mei, Hailiang
    Oomens, Julie
    Jansen, Willemijn
    Schram, Miranda T.
    Dekker, Andre
    Bermejo, Inigo
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 155
  • [35] Current Gaps in Ovarian Cancer Epidemiology: The Need for New Population-Based Research
    Doherty, Jennifer A.
    Jensen, Allan
    Kelemen, Linda E.
    Pearce, Celeste L.
    Poole, Elizabeth
    Schildkraut, Joellen M.
    Terry, Kathryn L.
    Tworoger, Shelley S.
    Webb, Penelope M.
    Wentzensen, Nicolas
    JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2017, 109 (10):
  • [36] Data management in substance use disorder treatment research: Implications from data harmonization of National Institute on Drug Abuse-funded randomized controlled trials
    Susukida, Ryoko
    Amin-Esmaeili, Masoumeh
    Mayo-Wilson, Evan
    Mojtabai, Ramin
    CLINICAL TRIALS, 2021, 18 (02) : 215 - 225
  • [37] Modeling Epidemiology Data with Machine Learning Technique to Detect Risk Factors for Gastric Cancer
    Mohammadnezhad, Kimia
    Sahebi, Mahmod Reza
    Alatab, Sudabeh
    Sadjadi, Alireza
    JOURNAL OF GASTROINTESTINAL CANCER, 2024, 55 (01) : 287 - 296
  • [38] Analysis of the rising incidence of thyroid cancer using the Surveillance, Epidemiology and End Results national cancer data registry
    Cramer, John D.
    Fu, Pingfu
    Harth, Karem C.
    Margevicius, Seunghee
    Wilhelm, Scott M.
    SURGERY, 2010, 148 (06) : 1147 - 1152
  • [39] Systematic Approach Toward Transcatheter Treatment of BAV Disease One Size Does Not Fit All
    Anwaruddin, Saif
    Desai, Nimesh
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2020, 76 (09) : 1031 - 1033
  • [40] Racial disparities in triple negative breast cancer: toward a causal architecture approach
    Siegel, Scott D.
    Brooks, Madeline M.
    Lynch, Shannon M.
    Sims-Mourtada, Jennifer
    Schug, Zachary T.
    Curriero, Frank C.
    BREAST CANCER RESEARCH, 2022, 24 (01)