Can Machine Learning Help Identify Patients at Risk for Recurrent Sexually Transmitted Infections?

被引:11
作者
Elder, Heather R. [1 ]
Gruber, Susan [2 ]
Willis, Sarah J. [1 ,3 ,4 ]
Cocoros, Noelle [3 ,4 ]
Callahan, Myfanwy [5 ]
Flagg, Elaine W. [6 ]
Klompas, Michael [3 ,4 ,7 ]
Hsu, Katherine K. [1 ,8 ]
机构
[1] Massachusetts Dept Publ Hlth, Bur Infect Dis & Lab Sci, Boston, MA USA
[2] Putnam Data Sci LLC, Cambridge, MA USA
[3] Harvard Med Sch, Dept Populat Med, Boston, MA 02115 USA
[4] Harvard Pilgrim Hlth Care Inst, Boston, MA USA
[5] Atrius Hlth, Boston, MA USA
[6] Ctr Dis Control & Prevent, Div STD Prevent, Natl Ctr HIV AIDS Viral Hepatitis STD & TB Preven, Atlanta, GA USA
[7] Brigham & Womens Hosp, Dept Med, 75 Francis St, Boston, MA 02115 USA
[8] Boston Univ, Med Ctr, Sect Pediat Infect Dis, Boston, MA USA
关键词
DISEASE SURVEILLANCE; CHLAMYDIA; MODELS;
D O I
10.1097/OLQ.0000000000001264
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Background A substantial fraction of sexually transmitted infections (STIs) occur in patients who have previously been treated for an STI. We assessed whether routine electronic health record (EHR) data can predict which patients presenting with an incident STI are at greatest risk for additional STIs in the next 1 to 2 years. Methods We used structured EHR data on patients 15 years or older who acquired an incident STI diagnosis in 2008 to 2015 in eastern Massachusetts. We applied machine learning algorithms to model risk of acquiring >= 1 or >= 2 additional STIs diagnoses within 365 or 730 days after the initial diagnosis using more than 180 different EHR variables. We performed sensitivity analysis incorporating state health department surveillance data to assess whether improving the accuracy of identifying STI cases improved algorithm performance. Results We identified 8723 incident episodes of laboratory-confirmed gonorrhea, chlamydia, or syphilis. Bayesian Additive Regression Trees, the best-performing algorithm of any single method, had a cross-validated area under the receiver operating curve of 0.75. Receiver operating curves for this algorithm showed a poor balance between sensitivity and positive predictive value (PPV). A predictive probability threshold with a sensitivity of 91.5% had a corresponding PPV of 3.9%. A higher threshold with a PPV of 29.5% had a sensitivity of 11.7%. Attempting to improve the classification of patients with and without repeat STIs diagnoses by incorporating health department surveillance data had minimal impact on cross-validated area under the receiver operating curve. Conclusions Machine algorithms using structured EHR data did not differentiate well between patients with and without repeat STIs diagnosis. Alternative strategies, able to account for sociobehavioral characteristics, could be explored.
引用
收藏
页码:56 / 62
页数:7
相关论文
共 27 条
[1]  
[Anonymous], 2017, Predictive Inference
[2]  
[Anonymous], 2016, R PACKAGE VERSION 13
[3]  
[Anonymous], 2020, OV SEX TRANSM DIS SU
[4]  
[Anonymous], **DATA OBJECT**
[5]   The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement [J].
Benchimol, Eric I. ;
Smeeth, Liam ;
Guttmann, Astrid ;
Harron, Katie ;
Moher, David ;
Petersen, Irene ;
Sorensen, Henrik T. ;
von Elm, Erik ;
Langan, Sinead M. .
PLOS MEDICINE, 2015, 12 (10)
[6]  
Centers for Disease Control and Prevention, 2019, SEX TRANSM DIS SURV
[7]   BART: BAYESIAN ADDITIVE REGRESSION TREES [J].
Chipman, Hugh A. ;
George, Edward I. ;
McCulloch, Robert E. .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :266-298
[8]   Temporal Patterns in Chlamydia Repeat Testing in Massachusetts [J].
Dee, Elizabeth C. ;
Hsu, Katherine K. ;
Kruskal, Benjamin A. ;
Menchaca, John T. ;
Zambarano, Bob ;
Cocoros, Noelle ;
Herrick, Brian ;
Weiss, Michelle D. Payne ;
Hafer, Ellen ;
Erani, Diana ;
Josephson, Mark ;
Young, Jessica ;
Torrone, Elizabeth A. ;
Flagg, Elaine W. ;
Klompas, Michael .
AMERICAN JOURNAL OF PREVENTIVE MEDICINE, 2019, 56 (03) :458-463
[9]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[10]   Chlamydial and gonococcal reinfection among men: a systematic review of data to evaluate the nee for retesting [J].
Fung, Monica ;
Scott, Katherine C. ;
Kent, Charlotte K. ;
Klausner, Jeffrey D. .
SEXUALLY TRANSMITTED INFECTIONS, 2007, 83 (04) :304-309