Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach

被引:14
|
作者
Tran, Van [1 ]
Saad, Tazmilur [1 ]
Tesfaye, Mehret [2 ]
Walelign, Sosina [2 ]
Wordofa, Moges [2 ]
Abera, Dessie [2 ]
Desta, Kassu [2 ]
Tsegaye, Aster [2 ]
Ay, Ahmet [1 ,3 ]
Taye, Bineyam [3 ]
机构
[1] Colgate Univ, Dept Math, 13 Oak Dr, Hamilton, NY 13346 USA
[2] Addis Ababa Univ, Coll Hlth Sci, Dept Med Lab Sci, Addis Ababa, Ethiopia
[3] Colgate Univ, Dept Biol, 13 Oak Dr, Hamilton, NY 13346 USA
关键词
Machine learning; H; pylori infection; Classification; Feature selection; Logistic regression; School children; Ethiopia; WATER SOURCE; INFECTION; EPIDEMIOLOGY; CHILDREN; POPULATION; SEROPREVALENCE; REGRESSION; COUNTRIES; SELECTION;
D O I
10.1186/s12879-022-07625-7
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Background Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques. Objective We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status. Methods We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms. Results The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%-a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression. Conclusion This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Machine learning-based prediction models for renal impairment in Chinese adults with hyperuricaemia: risk factor analysis
    Wu, Tianchen
    Yang, Hui
    Chen, Jinbin
    Kong, Wenwen
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [22] The relationship of helicobacter pylori infection and the risk of colon neoplasia based on meta-analysis
    Liu, Chao
    Zheng, Pengyuan
    INTERNATIONAL JOURNAL OF CLINICAL AND EXPERIMENTAL MEDICINE, 2016, 9 (02): : 2293 - 2300
  • [23] Quantification and cultivation of Helicobacter pylori ( H. pylori ) from various urban water environments: A comprehensive analysis of precondition methods and sample characteristics
    Ma, Chen
    Zhou, Fangyuan
    Lu, Dingnan
    Xu, Shengliang
    Luo, Jiayue
    Gan, Huihui
    Gao, Doudou
    Yao, Zhiyuan
    He, Weidong
    Kurup, Pradeep U.
    Zhu, David Z.
    ENVIRONMENT INTERNATIONAL, 2024, 187
  • [24] NeoAI 1.0: Machine learning-based paradigm for prediction of neonatal and infant risk of death
    Teji, Jagjit S.
    Jain, Suneet
    Gupta, Suneet K.
    Suri, Jasjit S.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 147
  • [25] Machine learning-based risk prediction of malignant arrhythmia in hospitalized patients with heart failure
    Wang, Qi
    Li, Bin
    Chen, Kangyu
    Yu, Fei
    Su, Hao
    Hu, Kai
    Liu, Zhiquan
    Wu, Guohong
    Yan, Ji
    Su, Guohai
    ESC HEART FAILURE, 2021, 8 (06): : 5363 - 5371
  • [26] Obesity Is a Risk Factor Associated with H. pylori-negative MALT Lymphoma of Stomach
    Mai, Brenda
    Friscia, Michaelangelo
    Elzamly, Shaimaa
    Thomas-Ogunniyi, Jaiyeola
    Wahed, Amer
    Nguyen, Andy
    Hu, Zhihong
    Cai, Zhenjian
    Chen, Lei
    ANNALS OF CLINICAL AND LABORATORY SCIENCE, 2021, 51 (05) : 609 - 614
  • [27] Risk factor, diagnosis, and current treatment of H. pylori Infection in Indonesia: A Literature Review
    Iman, Rizani Putri
    Junita, Tiroy
    Rachman, Rinaldo Indra
    Syam, Ari Fahrial
    ACTA MEDICA INDONESIANA, 2021, 53 (03) : 331 - 338
  • [28] Machine learning-based models for the prediction of breast cancer recurrence risk
    Zuo, Duo
    Yang, Lexin
    Jin, Yu
    Qi, Huan
    Liu, Yahui
    Ren, Li
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [29] Machine Learning-Based Approach for Hardware Faults Prediction
    Khalil, Kasem
    Eldash, Omar
    Kumar, Ashok
    Bayoumi, Magdy
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (11) : 3880 - 3892
  • [30] A Machine Learning-Based Approach for Crop Price Prediction
    Gururaj, H. L.
    Janhavi, V.
    Lakshmi, H.
    Soundarya, B. C.
    Paramesha, K.
    Ramesh, B.
    Rajendra, A. B.
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (03)