The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

被引:2
作者
Haque, Md Ashiqul [1 ]
Gedara, Muditha Lakmali Bodawatte [1 ]
Nickel, Nathan [1 ]
Turgeon, Maxime [2 ]
Lix, Lisa M. [1 ]
机构
[1] Univ Manitoba, Dept Community Hlth Sci, Winnipeg, MB, Canada
[2] Univ Manitoba, Dept Stat, Winnipeg, MB, Canada
基金
加拿大健康研究院;
关键词
Algorithms; Electronic health records; Review; Routinely collected health data; Validation study; ADMINISTRATIVE DATA; RISK-FACTORS; TOBACCO USE; CLINICAL TEXT; HEART-DISEASE; VALIDATION; RECORD; INFORMATION; ALGORITHMS; VETERANS;
D O I
10.1186/s12911-024-02416-3
中图分类号
R-058 [];
学科分类号
摘要
BackgroundSmoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms.MethodsThe Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool.ResultsThe initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity (p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data (p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment.ConclusionsMultiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.
引用
收藏
页数:15
相关论文
共 118 条
[1]   Combining population-based administrative health records and electronic medical records for disease surveillance [J].
Al-Azazi, Saeed ;
Singer, Alexander ;
Rabbani, Rasheda ;
Lix, Lisa M. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (1)
[2]   Accuracy and agreement of national spine register data for 474 patients compared to corresponding electronic patient records [J].
Alhaug, Ole Kristian ;
Kaur, Simran ;
Dolatowski, Filip ;
Smastuen, Milada Cvancarova ;
Solberg, Tore K. ;
Lonne, Greger .
EUROPEAN SPINE JOURNAL, 2022, 31 (03) :801-811
[3]   Classification of longitudinal data through a semiparametric mixed-effects model based on lasso-type estimators [J].
Arribas-Gil, Ana ;
De la Cruz, Rolando ;
Lebarbier, Emilie ;
Meza, Cristian .
BIOMETRICS, 2015, 71 (02) :333-343
[4]   Development of an algorithm for determining smoking status and behaviour over the life course from UK electronic primary care records [J].
Atkinson, Mark D. ;
Kennedy, Jonathan I. ;
John, Ann ;
Lewis, Keir E. ;
Lyons, Ronan A. ;
Brophy, Sinead T. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
[5]   Systematic Review of Validation Studies of the Use of Administrative Data to Identify Serious Infections [J].
Barber, Claire ;
Lacaille, Diane ;
Fortin, Paul R. .
ARTHRITIS CARE & RESEARCH, 2013, 65 (08) :1343-1357
[6]  
Barrett JK., 2017, Dynamic risk prediction for cardiovascular disease: an illustration using the ARIC study, P47
[7]   Interrater Reliability in Systematic Review Methodology: Exploring Variation in Coder Decision-Making [J].
Belur, Jyoti ;
Tompson, Lisa ;
Thornton, Amy ;
Simon, Miranda .
SOCIOLOGICAL METHODS & RESEARCH, 2021, 50 (02) :837-865
[8]   Predicting dementia with routine care EMR data [J].
Ben Miled, Zina ;
Haas, Kyle ;
Black, Christopher M. ;
Khandker, Rezaul Karim ;
Chandrasekaran, Vasu ;
Lipton, Richard ;
Boustani, Malaz A. .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 102
[9]   Building a Pan-Canadian Primary Care Sentinel Surveillance Network: Initial Development and Moving Forward [J].
Birtwhistle, Richard ;
Keshavjee, Karim ;
Lambert-Lanning, Anita ;
Godwin, Marshall ;
Greiver, Michelle ;
Manca, Donna ;
Lagace, Claudia .
JOURNAL OF THE AMERICAN BOARD OF FAMILY MEDICINE, 2009, 22 (04) :412-422
[10]   Identifying hypertension-related comorbidities from administrative data: What's the optimal approach? [J].
Borzecki, AM ;
Wong, AT ;
Hickey, EC ;
Ash, AS ;
Berlowitz, DR .
AMERICAN JOURNAL OF MEDICAL QUALITY, 2004, 19 (05) :201-206