Strategies and Lessons Learned During Cleaning of Data From Research Panel Participants: Cross-sectional Web-Based Health Behavior Survey Study

被引:12
作者
Arevalo, Mariana [1 ]
Brownstein, Naomi C. [2 ,3 ,4 ]
Whiting, Junmin [2 ]
Meade, Cathy D. [1 ,3 ,5 ]
Gwede, Clement K. [1 ,3 ,5 ,6 ]
Vadaparampil, Susan T. [1 ,3 ,7 ]
Tillery, Kristin J. [8 ]
Islam, Jessica Y. [3 ,7 ,9 ]
Giuliano, Anna R. [3 ,7 ,9 ]
Christy, Shannon M. [1 ,3 ,6 ,7 ]
机构
[1] H Lee Moffitt Canc Ctr & Res Inst, Dept Hlth Outcomes & Behav, 12902 Magnolia Dr, Tampa, FL 33612 USA
[2] H Lee Moffitt Canc Ctr & Res Inst, Dept Biostat & Bioinformat, Tampa, FL 33612 USA
[3] Univ S Florida, Dept Oncol Sci, Tampa, FL USA
[4] Med Univ South Carolina, Dept Publ Hlth Sci, Charleston, SC USA
[5] H Lee Moffitt Canc Ctr & Res Inst, Dept Genitourinary Oncol, Tampa, FL 33612 USA
[6] H Lee Moffitt Canc Ctr & Res Inst, Dept Gastrointestinal Oncol, Tampa, FL 33612 USA
[7] H Lee Moffitt Canc Ctr & Res Inst, Ctr Immunizat & Infect Res Canc, Tampa, FL 33612 USA
[8] H Lee Moffitt Canc Ctr & Res Inst, Participant Res Intervent & Measurement Core, Tampa, FL 33612 USA
[9] H Lee Moffitt Canc Ctr & Res Inst, Dept Canc Epidemiol, Tampa, FL 33612 USA
基金
美国国家卫生研究院;
关键词
data cleaning; data management; data integrity; quality assessment; research panel; web-based survey; interdisciplinary research; surveys and questionnaires; health behavior; internet; INTERNET RESEARCH; RESPONSES;
D O I
10.2196/35797
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: The use of web-based methods to collect population-based health behavior data has burgeoned over the past two decades. Researchers have used web-based platforms and research panels to study a myriad of topics. Data cleaning prior to statistical analysis of web-based survey data is an important step for data integrity. However, the data cleaning processes used by research teams are often not reported. Objective: The objectives of this manuscript are to describe the use of a systematic approach to clean the data collected via a web-based platform from panelists and to share lessons learned with other research teams to promote high-quality data cleaning process improvements. Methods: Data for this web-based survey study were collected from a research panel that is available for scientific and marketing research. Participants (N=4000) were panelists recruited either directly or through verified partners of the research panel, were aged 18 to 45 years, were living in the United States, had proficiency in the English language, and had access to the internet. Eligible participants completed a health behavior survey via Qualtrics. Informed by recommendations from the literature, our interdisciplinary research team developed and implemented a systematic and sequential plan to inform data cleaning processes. This included the following: (1) reviewing survey completion speed, (2) identifying consecutive responses, (3) identifying cases with contradictory responses, and (4) assessing the quality of open-ended responses. Implementation of these strategies is described in detail, and the Checklist for E-Survey Data Integrity is offered as a tool for other investigators. Results: Data cleaning procedures resulted in the removal of 1278 out of 4000 (31.95%) response records, which failed one or more data quality checks. First, approximately one-sixth of records (n=648, 16.20%) were removed because respondents completed the survey unrealistically quickly (ie, <10 minutes). Next, 7.30% (n=292) of records were removed because they contained evidence of consecutive responses. A total of 4.68% (n=187) of records were subsequently removed due to instances of conflicting responses. Finally, a total of 3.78% (n=151) of records were removed due to poor-quality open-ended responses. Thus, after these data cleaning steps, the final sample contained 2722 responses, representing 68.05% of the original sample. Conclusions: Examining data integrity and promoting transparency of data cleaning reporting is imperative for web-based survey research. Ensuring a high quality of data both prior to and following data collection is important. Our systematic approach helped eliminate records flagged as being of questionable quality. Data cleaning and management procedures should be reported more frequently, and systematic approaches should be adopted as standards of good practice in this type of research.
引用
收藏
页数:13
相关论文
共 45 条
  • [1] Reducing depression-related stigma and increasing treatment seeking among adolescents: randomized controlled trial of a brief video intervention
    Amsalem, Doron
    Martin, Andres
    [J]. JOURNAL OF CHILD PSYCHOLOGY AND PSYCHIATRY, 2022, 63 (02) : 210 - 217
  • [2] [Anonymous], INT SURV
  • [3] [Anonymous], US
  • [4] Detecting False Identities: A Solution to Improve Web-Based Surveys and Research on Leadership and Health/Well-Being
    Bernerth, Jeremy B.
    Aguinis, Herman
    Taylor, Erik C.
    [J]. JOURNAL OF OCCUPATIONAL HEALTH PSYCHOLOGY, 2021, 26 (06) : 564 - 581
  • [5] Mechanical Turk upends social sciences
    Bohannon, John
    [J]. SCIENCE, 2016, 352 (6291) : 1263 - 1264
  • [6] Impact of e-cigarette health warnings on motivation to vape and smoke
    Brewer, Noel T.
    Jeong, Michelle
    Hall, Marissa G.
    Baig, Sabeeh A.
    Mendel, Jennifer R.
    Lazard, Allison J.
    Noar, Seth M.
    Kameny, Madeline R.
    Ribisl, Kurt M.
    [J]. TOBACCO CONTROL, 2019, 28 (E1) : E64 - E70
  • [7] COVID-19 vaccine behaviors and intentions among a national sample of United States adults ages 18-45
    Brownstein, Naomi C.
    Reddy, Harika
    Whiting, Junmin
    Kasting, Monica L.
    Head, Katharine J.
    Vadaparampil, Susan T.
    Giuliano, Anna R.
    Gwede, Clement K.
    Meade, Cathy D.
    Christy, Shannon M.
    [J]. PREVENTIVE MEDICINE, 2022, 160
  • [8] Concerns and recommendations for using Amazon MTurk for eating disorder research
    Burnette, C. Blair
    Luzier, Jessica L.
    Bennett, Brooke L.
    Weisenmuller, Chantel M.
    Kerr, Patrick
    Martin, Shelby
    Keener, Jillian
    Calderwood, Lisa
    [J]. INTERNATIONAL JOURNAL OF EATING DISORDERS, 2022, 55 (02) : 263 - 272
  • [9] Online panels in social science research: Expanding sampling methods beyond Mechanical Turk
    Chandler, Jesse
    Rosenzweig, Cheskie
    Moss, Aaron J.
    Robinson, Jonathan
    Litman, Leib
    [J]. BEHAVIOR RESEARCH METHODS, 2019, 51 (05) : 2022 - 2038
  • [10] Methods for the detection of carelessly invalid responses in survey data
    Curran, Paul G.
    [J]. JOURNAL OF EXPERIMENTAL SOCIAL PSYCHOLOGY, 2016, 66 : 4 - 19