Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival

被引:84
作者
Yang, Daniel X. [1 ]
Khera, Rohan [2 ,3 ]
Miccio, Joseph A. [1 ]
Jairam, Vikram [1 ]
Chang, Enoch [1 ]
Yu, James B. [1 ]
Park, Henry S. [1 ]
Krumholz, Harlan M. [2 ,3 ]
Aneja, Sanjay [1 ,3 ]
机构
[1] Yale Sch Med, Dept Therapeut Radiol, 330 Cedar St,CB326, New Haven, CT 06520 USA
[2] Yale Sch Med, Dept Internal Med, New Haven, CT 06520 USA
[3] Yale Sch Med, Ctr Outcomes Res & Evaluat, New Haven, CT 06520 USA
关键词
STEREOTACTIC BODY RADIOTHERAPY; REAL-WORLD EVIDENCE; MULTIPLE IMPUTATION; DATA-BASE; STAGE; SURVEILLANCE; RECEIPT; CARE; INTEROPERABILITY; IMPLEMENTATION;
D O I
10.1001/jamanetworkopen.2021.1793
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
IMPORTANCE Cancer registries are important real-world data sources consisting of data abstraction from the medical record; however, patients with unknown or missing data are underrepresented in studies that use such data sources. OBJECTIVE To assess the prevalence of missing data and its association with overall survival among patients with cancer. DESIGN, SETTING, AND PARTICIPANTS In this retrospective cohort study, all variables within the National Cancer Database were reviewed for missing or unknown values for patients with the 3 most common cancers in the US who received diagnoses from January 1, 2006, to December 31, 2015. The prevalence of patient records with missing data and the association with overall survival were assessed. Data analysis was performed from February to August 2020. EXPOSURES Any missing data field within a patient record among 63 variables of interest from more than 130 total variables in the National Cancer Database. MAIN OUTCOMES AND MEASURES Prevalence of missing data in the medical records of patients with cancer and associated 2-year overall survival. RESULTS A total of 1 198 749 patients with non-small cell lung cancer (mean [SD] age, 68.5 [10.9] years; 628 811 men [52.5%]), 2 120 775 patients with breast cancer (mean [SD] age, 61.0 [13.3] years; 2 101 758 women [99.1%]), and 1 158 635 patients with prostate cancer (mean [SD] age, 65.2 [9.0] years; 100% men) were included in the analysis. Among those with non-small cell lung cancer, 851 295 patients (71.0%) were missing data for variables of interest; 2-year overall survival was 33.2% for patients with missing data and 51.6% for patients with complete data (P<.001). Among those with breast cancer, 1 161 096 patients (54.7%) were missing data for variables of interest; 2-year overall survival was 93.2% for patients with missing data and 93.9% for patients with complete data (P<.001). Among those with prostate cancer, 460 167 patients (39.7%) were missing data for variables of interest; 2-year overall survival was 91.0% for patients with missing data and 95.6% for patients with complete data (P<.001). CONCLUSIONS AND RELEVANCE This study found that within a large cancer registry-based real-world data source, there was a high prevalence of missing data that were unable to be ascertained from the medical record. The prevalence of missing data among patients with cancer was associated with heterogeneous differences in overall survival. Improvements in documentation and data quality are necessary to make optimal use of real-world data for clinical advancements.
引用
收藏
页数:17
相关论文
共 55 条
[1]   Trends in Diagnosis and Disparities in Initial Management of High-Risk Prostate Cancer in the US [J].
Agrawal, Vishesh ;
Ma, Xiaoyue ;
Hu, Jim C. ;
Barbieri, Christopher E. ;
Nagar, Himanshu .
JAMA NETWORK OPEN, 2020, 3 (08) :E2014674
[2]  
American College of Surgeons, FACI ONC REG DAT STA
[3]  
[Anonymous], 2021, PUBLIC MISSING DATA
[4]   Feasibility of Using Real-World Data to Replicate Clinical Trial Evidence [J].
Bartlett, Victoria L. ;
Dhruva, Sanket S. ;
Shah, Nilay D. ;
Ryan, Patrick ;
Ross, Joseph S. .
JAMA NETWORK OPEN, 2019, 2 (10)
[5]   The National Cancer Data Base: A powerful initiative to improve cancer care in the United States [J].
Bilimoria, Karl Y. ;
Stewart, Andrew K. ;
Winchester, David P. ;
Ko, Clifford Y. .
ANNALS OF SURGICAL ONCOLOGY, 2008, 15 (03) :683-690
[6]   What's Lost in What's Missing: A Thoughtful Approach to Missing Data in the National Cancer Database [J].
Boffa, Daniel J. .
ANNALS OF SURGICAL ONCOLOGY, 2019, 26 (03) :709-710
[7]   Using the National Cancer Database for Outcomes Research [J].
Boffa, Daniel J. ;
Rosen, Joshua E. ;
Mallin, Katherine ;
Loomis, Ashley ;
Gay, Greer ;
Palis, Bryan ;
Thoburn, Kathleen ;
Gress, Donna ;
McKellar, Daniel P. ;
Shulman, Lawrence N. ;
Facktor, Matthew A. ;
Winchester, David P. .
JAMA ONCOLOGY, 2017, 3 (12) :1722-1728
[8]   Real-world data: towards achieving the achievable in cancer care [J].
Booth, Christopher M. ;
Karim, Safiya ;
Mackillop, William J. .
NATURE REVIEWS CLINICAL ONCOLOGY, 2019, 16 (05) :312-325
[9]   Deep learning and alternative learning strategies for retrospective real-world clinical data [J].
Chen, David ;
Liu, Sijia ;
Kingsbury, Paul ;
Sohn, Sunghwan ;
Storlie, Curtis B. ;
Habermann, Elizabeth B. ;
Naessens, James M. ;
Larson, David W. ;
Liu, Hongfang .
NPJ DIGITAL MEDICINE, 2019, 2 (1)
[10]   Prevalence and characteristics of cancer patients receiving care from single vs. multiple institutions [J].
Clarke, Christina A. ;
Glaser, Sally L. ;
Leung, Rita ;
Davidson-Allen, Kathleen ;
Gomez, Scarlett L. ;
Keegan, Theresa H. M. .
CANCER EPIDEMIOLOGY, 2017, 46 :27-33