Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach

被引:14
|
作者
Tran, Van [1 ]
Saad, Tazmilur [1 ]
Tesfaye, Mehret [2 ]
Walelign, Sosina [2 ]
Wordofa, Moges [2 ]
Abera, Dessie [2 ]
Desta, Kassu [2 ]
Tsegaye, Aster [2 ]
Ay, Ahmet [1 ,3 ]
Taye, Bineyam [3 ]
机构
[1] Colgate Univ, Dept Math, 13 Oak Dr, Hamilton, NY 13346 USA
[2] Addis Ababa Univ, Coll Hlth Sci, Dept Med Lab Sci, Addis Ababa, Ethiopia
[3] Colgate Univ, Dept Biol, 13 Oak Dr, Hamilton, NY 13346 USA
关键词
Machine learning; H; pylori infection; Classification; Feature selection; Logistic regression; School children; Ethiopia; WATER SOURCE; INFECTION; EPIDEMIOLOGY; CHILDREN; POPULATION; SEROPREVALENCE; REGRESSION; COUNTRIES; SELECTION;
D O I
10.1186/s12879-022-07625-7
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Background Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques. Objective We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status. Methods We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms. Results The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%-a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression. Conclusion This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A Machine Learning-Based Prediction Model for Cardiovascular Risk in Women With Preeclampsia
    Wang, Guan
    Zhang, Yanbo
    Li, Sijin
    Zhang, Jun
    Jiang, Dongkui
    Li, Xiuzhen
    Li, Yulin
    Du, Jie
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2021, 8
  • [32] A Machine Learning-Based Framework for the Prediction of Cervical Cancer Risk in Women
    Kaushik, Keshav
    Bhardwaj, Akashdeep
    Bharany, Salil
    Alsharabi, Naif
    Rehman, Ateeq Ur
    Eldin, Elsayed Tag
    Ghamry, Nivin A.
    SUSTAINABILITY, 2022, 14 (19)
  • [33] Birthweight Range Prediction and Classification: A Machine Learning-Based Sustainable Approach
    Alabbad, Dina A.
    Ajibi, Shahad Y.
    Alotaibi, Raghad B.
    Alsqer, Noura K.
    Alqahtani, Rahaf A.
    Felemban, Noor M.
    Rahman, Atta
    Aljameel, Sumayh S.
    Ahmed, Mohammed Imran Basheer
    Youldash, Mustafa M.
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 770 - 788
  • [34] Helicobacter pylori infection as a risk factor for diabetes: a meta-analysis of case-control studies
    Mansori, Kamyar
    Moradi, Yousef
    Naderpour, Sara
    Rashti, Roya
    Moghaddam, Ali Baradaran
    Saed, Lotfolah
    Mohammadi, Hedyeh
    BMC GASTROENTEROLOGY, 2020, 20 (01)
  • [35] Risk Factor Assessment of Helicobacter pylori Infection in a Rural Community of People with Gastritis: A Community Based Cross-Sectional Study
    Jaiswal, Suresh
    Tiwari, Bishnu Raj
    Sharma, Dinesh C.
    JOURNAL OF PHARMACEUTICAL RESEARCH INTERNATIONAL, 2021, 33 (20A) : 56 - 63
  • [36] The effect of cranberry supplementation on Helicobacter pylori eradication in H. pylori positive subjects: a systematic review and meta-analysis of randomised controlled trials
    Nikbazm, Ronak
    Rahimi, Zahra
    Moradi, Yousef
    Alipour, Meysam
    Shidfar, Farzad
    BRITISH JOURNAL OF NUTRITION, 2022, 128 (06) : 1090 - 1099
  • [37] A machine learning-based approach to prognostic analysis of thoracic transplantations
    Delen, Dursun
    Oztekin, Asil
    Kong, Zhenyu
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 49 (01) : 33 - 42
  • [38] Machine Learning-based Cascade Size Prediction Analysis in Power Systems
    Sami, Naeem Md
    Naeini, Mia
    2023 NORTH AMERICAN POWER SYMPOSIUM, NAPS, 2023,
  • [39] A machine learning-based diabetes risk prediction modeling study
    Ming, Jiexiu
    Xu, Junyi
    Zhang, Miaomiao
    Li, Ningyu
    Yan, Xu
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 363 - 369
  • [40] Analysis on Benefits and Costs of Machine Learning-Based Early Hospitalization Prediction
    Kim, Eunbi
    Han, Kap Su
    Cheong, Taesu
    Lee, Sung Woo
    Eun, Joonyup
    Kim, Su Jin
    IEEE ACCESS, 2022, 10 : 32479 - 32493