Machine learning methods for propensity and disease risk score estimation in high-dimensional data: a plasmode simulation and real-world data cohort analysis

被引:0
作者
Guo, Yuchen [1 ]
Strauss, Victoria Y. [2 ]
Catala, Marti [1 ]
Jodicke, Annika M. [1 ]
Khalid, Sara [1 ]
Prieto-Alhambra, Daniel [1 ,3 ]
机构
[1] Univ Oxford, Ctr Stat Med, Nuffield Dept Orthopaed Rheumatol & Musculoskeleta, Pharmaco & Device Epidemiol Grp, Oxford, England
[2] Boehringer Ingelheim GmbH & Co KG, Ingelheim, Germany
[3] Erasmus MC, Dept Med Informat, Rotterdam, Netherlands
关键词
treatment effect; observational research; machine learning; propensity scores; disease risk scores; negative control; REGRESSION; SHRINKAGE; SELECTION; SAFETY; MODEL;
D O I
10.3389/fphar.2024.1395707
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
Introduction Machine learning (ML) methods are promising and scalable alternatives for propensity score (PS) estimation, but their comparative performance in disease risk score (DRS) estimation remains unexplored.Methods We used real-world data comparing antihypertensive users to non-users with 69 negative control outcomes, and plasmode simulations to study the performance of ML methods in PS and DRS estimation. We conducted a cohort study using UK primary care records. Further, we conducted a plasmode simulation with synthetic treatment and outcome mimicking empirical data distributions. We compared four PS and DRS estimation methods: 1. Reference: Logistic regression including clinically chosen confounders. 2. Logistic regression with L1 regularisation (LASSO). 3. Multi-layer perceptron (MLP). 4. Extreme Gradient Boosting (XgBoost). Covariate balance, coverage of the null effect of negative control outcomes (real-world data) and bias based on the absolute difference between observed and true effects (for plasmode) were estimated. 632,201 antihypertensive users and nonusers were included.Results ML methods outperformed the reference method for PS estimation in some scenarios, both in terms of covariate balance and coverage/bias. Specifically, XgBoost achieved the best performance. DRS-based methods performed worse than PS in all tested scenarios.Discussion We found that ML methods could be reliable alternatives for PS estimation. ML-based DRS methods performed worse than PS ones, likely given the rarity of outcomes.
引用
收藏
页数:10
相关论文
共 40 条
[1]   Propensity scores based methods for estimating average treatment effect and average treatment effect among treated: A comparative study [J].
Abdia, Younathan ;
Kulasekera, K. B. ;
Datta, Somnath ;
Boakye, Maxwell ;
Kong, Maiying .
BIOMETRICAL JOURNAL, 2017, 59 (05) :967-985
[2]   Best (but oft-forgotten) practices: propensity score methods in clinical nutrition research [J].
Ali, M. Sanni ;
Groenwold, Rolf H. H. ;
Klungel, Olaf H. .
AMERICAN JOURNAL OF CLINICAL NUTRITION, 2016, 104 (02) :247-258
[3]  
Amusa L., 2023, J. Appl. Statistics, DOI [10.14456/sjst-psu.2021.132, DOI 10.14456/SJST-PSU.2021.132]
[4]   Performance of Disease Risk Scores, Propensity Scores, and Traditional Multivariable Outcome Regression in the Presence of Multiple Confounders [J].
Arbogast, Patrick G. ;
Ray, Wayne A. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2011, 174 (05) :613-620
[5]   An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies [J].
Austin, Peter C. .
MULTIVARIATE BEHAVIORAL RESEARCH, 2011, 46 (03) :399-424
[6]   A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting [J].
Cannas, Massimo ;
Arpino, Bruno .
BIOMETRICAL JOURNAL, 2019, 61 (04) :1049-1072
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[8]   Guiding Principles to Address the Impact of Algorithm Bias on Racial and Ethnic Disparities in Health and Health Care [J].
Chin, Marshall H. ;
Afsar-Manesh, Nasim ;
Bierman, Arlene S. ;
Chang, Christine ;
Colon-Rodriguez, Caleb J. ;
Dullabh, Prashila ;
Duran, Deborah Guadalupe ;
Fair, Malika ;
Hernandez-Boussard, Tina ;
Hightower, Maia ;
Jain, Anjali ;
Jordan, William B. ;
Konya, Stephen ;
Moore, Roslyn Holliday ;
Moore, Tamra Tyree ;
Rodriguez, Richard ;
Shaheen, Gauher ;
Snyder, Lynne Page ;
Srinivasan, Mithuna ;
Umscheid, Craig A. ;
Ohno-Machado, Lucila .
JAMA NETWORK OPEN, 2023, 6 (12) :E2345050
[9]   Estimating propensity scores using neural networks and traditional methods: a comparative simulation study [J].
Collier, Zachary K. ;
Leite, Walter L. ;
Zhang, Haobai .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (09) :4545-4560
[10]   Performance of Disease Risk Score Matching in Nested Case-Control Studies: A Simulation Study [J].
Desai, Rishi J. ;
Glynn, Robert J. ;
Wang, Shirley ;
Gagne, Joshua J. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2016, 183 (10) :949-957