Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation

被引:20
作者
He, Jiajin [1 ]
Li, Jinhua [2 ]
Jiang, Siqing [1 ]
Cheng, Wei [3 ]
Jiang, Jun [3 ]
Xu, Yun [3 ]
Yang, Jiezhe [3 ]
Zhou, Xin [3 ]
Chai, Chengliang [3 ]
Wu, Chao [4 ]
机构
[1] Zhejiang Univ, Sch Publ Hlth, Sch Med, Hangzhou, Peoples R China
[2] Zhejiang Univ, Sch Software Technol, Ningbo, Peoples R China
[3] Zhejiang Prov Ctr Dis Control & Prevent, Hangzhou, Peoples R China
[4] Zhejiang Univ, Sch Publ Affairs, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
machine learning; HIV; MSM; prediction; models; RISK; MSM; HIV/AIDS; CHINA;
D O I
10.3389/fpubh.2022.967681
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
BackgroundContinuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions. MethodsWe extracted data from MSM sentinel surveillance in Zhejiang province from 2018 to 2020. Univariate logistic regression was used to select significant variables in 2018-2019 data (P < 0.05). After data processing and feature selection, we divided the model development data into two groups by stratified random sampling: training data (70%) and testing data (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to solve the problem of unbalanced data. The evaluation metrics of model performance were comprised of accuracy, precision, recall, F-measure, and the area under the receiver operating characteristic curve (AUC). Then, we explored three commonly-used machine learning algorithms to compare with logistic regression (LR), including decision tree (DT), support vector machines (SVM), and random forest (RF). Finally, the four models were validated prospectively with 2020 data from Zhejiang province. ResultsA total of 6,346 MSM were included in model development data, 372 of whom were diagnosed with HIV. In feature selection, 12 variables were selected as model predicting indicators. Compared with LR, the algorithms of DT, SVM, and RF improved the classification prediction performance in SMOTE-processed data, with the AUC of 0.778, 0.856, 0.887, and 0.942, respectively. RF was the best-performing algorithm (accuracy = 0.871, precision = 0.960, recall = 0.775, F-measure = 0.858, and AUC = 0.942). And the RF model still performed well on prospective validation (AUC = 0.846). ConclusionMachine learning models are substantially better than conventional LR model and RF should be considered in prediction tools of HIV infection in Chinese MSM. Further studies are needed to optimize and promote these algorithms and evaluate their impact on HIV prevention of MSM.
引用
收藏
页数:10
相关论文
共 41 条
[1]   Algorithmic prediction of HIV status using nation-wide electronic registry data [J].
Ahlstrom, Magnus G. ;
Ronit, Andreas ;
Omland, Lars Haukali ;
Vedel, Soren ;
Obel, Niels .
ECLINICALMEDICINE, 2019, 17
[2]   Logistic regression in the medical literature: Standards for use and reporting, with particular attention to one medical domain [J].
Bagley, SC ;
White, H ;
Golomb, BA .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2001, 54 (10) :979-985
[3]   Predicting the diagnosis of HIV and sexually transmitted infections among men who have sex with men using machine learning approaches [J].
Bao, Yining ;
Medland, Nicholas A. ;
Fairley, Christopher K. ;
Wu, Jinrong ;
Shang, Xianwen ;
Chow, Eric P. F. ;
Xu, Xianglong ;
Ge, Zongyuan ;
Zhuang, Xun ;
Zhang, Lei .
JOURNAL OF INFECTION, 2021, 82 (01) :48-59
[4]   Social Media Engagement and HIV Testing Among Men Who Have Sex With Men in China: A Nationwide Cross-Sectional Survey [J].
Cao, Bolin ;
Liu, Chuncheng ;
Durvasula, Maya ;
Tang, Weiming ;
Pan, Stephen ;
Saffer, Adam J. ;
Wei, Chongyi ;
Tucker, Joseph D. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2017, 19 (07)
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]  
Chen L, 2021, JMIR PUBLIC HLTH SUR, V7, DOI 10.2196/25360
[7]  
Collins GS, 2015, ANN INTERN MED, V162, P55, DOI [10.1002/bjs.9736, 10.1038/bjc.2014.639, 10.7326/M14-0697, 10.1016/j.jclinepi.2014.11.010, 10.7326/M14-0698, 10.1136/bmj.g7594, 10.1111/eci.12376, 10.1016/j.eururo.2014.11.025, 10.1186/s12916-014-0241-z]
[8]   Epidemiological situation of acquired immunodeficiency syndrome (AIDS)-related mortality in a municipality in northeastern Brazil. A retrospective cross-sectional study [J].
da Silva, Luana Rodrigues ;
Araujo, Ellen Thallita Hill ;
Carvalho, Moises Lopes ;
Pinheiro Landim Almeida, Camila Aparecida ;
da Silva Oliveira, Adelia Dalva ;
Gomes de Carvalho, Patricia Maria ;
Rodrigues, Tatyanne Silva ;
Campelo, Viriato .
SAO PAULO MEDICAL JOURNAL, 2018, 136 (01) :37-43
[9]   Machine Learning in Medicine [J].
Deo, Rahul C. .
CIRCULATION, 2015, 132 (20) :1920-1930
[10]   A decision tree to improve identification of pathogenic mutations in clinical practice [J].
do Nascimento, Priscilla Machado ;
Medeiros, Inacio Gomes ;
Falcao, Raul Maia ;
Stransky, Beatriz ;
Santana de Souza, Jorge Estefano .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)