Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach

被引:14
|
作者
Tran, Van [1 ]
Saad, Tazmilur [1 ]
Tesfaye, Mehret [2 ]
Walelign, Sosina [2 ]
Wordofa, Moges [2 ]
Abera, Dessie [2 ]
Desta, Kassu [2 ]
Tsegaye, Aster [2 ]
Ay, Ahmet [1 ,3 ]
Taye, Bineyam [3 ]
机构
[1] Colgate Univ, Dept Math, 13 Oak Dr, Hamilton, NY 13346 USA
[2] Addis Ababa Univ, Coll Hlth Sci, Dept Med Lab Sci, Addis Ababa, Ethiopia
[3] Colgate Univ, Dept Biol, 13 Oak Dr, Hamilton, NY 13346 USA
关键词
Machine learning; H; pylori infection; Classification; Feature selection; Logistic regression; School children; Ethiopia; WATER SOURCE; INFECTION; EPIDEMIOLOGY; CHILDREN; POPULATION; SEROPREVALENCE; REGRESSION; COUNTRIES; SELECTION;
D O I
10.1186/s12879-022-07625-7
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Background Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques. Objective We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status. Methods We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms. Results The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%-a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression. Conclusion This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach
    Van Tran
    Tazmilur Saad
    Mehret Tesfaye
    Sosina Walelign
    Moges Wordofa
    Dessie Abera
    Kassu Desta
    Aster Tsegaye
    Ahmet Ay
    Bineyam Taye
    BMC Infectious Diseases, 22
  • [2] Machine Learning-Based Prediction of Helicobacter pylori Infection Study in Adults
    Liu, Min
    Liu, Shiyu
    Lu, Zhaolin
    Chen, Hu
    Xu, Yuling
    Gong, Xue
    Chen, Guangxia
    MEDICAL SCIENCE MONITOR, 2024, 30
  • [3] Accuracy of rapid Helicobacter pylori antigen tests for the surveillance of the updated prevalence of H. pylori in Taiwan
    Fang, Yu-Jen
    Chen, Mei-Jyh
    Chen, Chieh-Chang
    Lee, Ji-Yuh
    Yang, Tsung-Hua
    Yu, Chien-Chun
    Chiu, Min-Chin
    Kuo, Chia-Chi
    Weng, Yu-Jong
    Bair, Ming-Jong
    Wu, Ming-Shiang
    Luo, Jiing-Chyuan
    Liou, Jyh-Ming
    JOURNAL OF THE FORMOSAN MEDICAL ASSOCIATION, 2020, 119 (11) : 1626 - 1633
  • [4] Helicobacter pylori Eradication Therapy for Functional Dyspepsia: A Meta-Analysis by Region and H. pylori Prevalence
    Kang, Seung Joo
    Park, Boram
    Shin, Cheol Min
    JOURNAL OF CLINICAL MEDICINE, 2019, 8 (09)
  • [5] Association between Helicobacter pylori seropositivity and mild to moderate COPD: clinical implications in an Asian country with a high prevalence of H. pylori
    Lee, Ha Youn
    Kim, Ji Won
    Lee, Jung Kyu
    Heo, Eun Young
    Chung, Hee Soon
    Kim, Deog Keom
    INTERNATIONAL JOURNAL OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE, 2016, 11 : 2055 - 2062
  • [6] Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
    Zafar, Aziz
    Attia, Ziad
    Tesfaye, Mehret
    Walelign, Sosina
    Wordofa, Moges
    Abera, Dessie
    Desta, Kassu
    Tsegaye, Aster
    Ay, Ahmet
    Taye, Bineyam
    PLOS NEGLECTED TROPICAL DISEASES, 2022, 16 (06):
  • [7] Prevalence and Risk Factors of H. pylori from Dyspeptic Patients in Northwest Ethiopia: A Hospital Based Cross-sectional Study
    Abebaw, Wubejig
    Kibret, Mulugeta
    Abera, Bayeh
    ASIAN PACIFIC JOURNAL OF CANCER PREVENTION, 2014, 15 (11) : 4459 - 4463
  • [8] Prevalence of Helicobacter pylori Infection in High-school Students on Lanyu Island, Taiwan: Risk Factor Analysis and Effect on Growth
    Chi, Hsin
    Bair, Ming-Jong
    Wu, Ming-Shiang
    Chiu, Nan-Chang
    Hsiao, Ya-Chun
    Chang, Kuan-Yu
    JOURNAL OF THE FORMOSAN MEDICAL ASSOCIATION, 2009, 108 (12) : 929 - 936
  • [9] Machine Learning-Based Risk Prediction of Discharge Status for Sepsis
    Cai, Kaida
    Lou, Yuqing
    Wang, Zhengyan
    Yang, Xiaofang
    Zhao, Xin
    ENTROPY, 2024, 26 (08)
  • [10] Prevalence and risk factors of H. pylori infection among outpatient in Karaganda city (Kazakhstan)
    Seisenbekova, Aizhan
    Laryushina, Yelena
    Yukhnevich, Yekaterina
    Lavrinenko, Alyona
    Shkreba, Alexey
    FUTURE SCIENCE OA, 2025, 11 (01):