Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach

被引:14
|
作者
Tran, Van [1 ]
Saad, Tazmilur [1 ]
Tesfaye, Mehret [2 ]
Walelign, Sosina [2 ]
Wordofa, Moges [2 ]
Abera, Dessie [2 ]
Desta, Kassu [2 ]
Tsegaye, Aster [2 ]
Ay, Ahmet [1 ,3 ]
Taye, Bineyam [3 ]
机构
[1] Colgate Univ, Dept Math, 13 Oak Dr, Hamilton, NY 13346 USA
[2] Addis Ababa Univ, Coll Hlth Sci, Dept Med Lab Sci, Addis Ababa, Ethiopia
[3] Colgate Univ, Dept Biol, 13 Oak Dr, Hamilton, NY 13346 USA
关键词
Machine learning; H; pylori infection; Classification; Feature selection; Logistic regression; School children; Ethiopia; WATER SOURCE; INFECTION; EPIDEMIOLOGY; CHILDREN; POPULATION; SEROPREVALENCE; REGRESSION; COUNTRIES; SELECTION;
D O I
10.1186/s12879-022-07625-7
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Background Although previous epidemiological studies have examined the potential risk factors that increase the likelihood of acquiring Helicobacter pylori infections, most of these analyses have utilized conventional statistical models, including logistic regression, and have not benefited from advanced machine learning techniques. Objective We examined H. pylori infection risk factors among school children using machine learning algorithms to identify important risk factors as well as to determine whether machine learning can be used to predict H. pylori infection status. Methods We applied feature selection and classification algorithms to data from a school-based cross-sectional survey in Ethiopia. The data set included 954 school children with 27 sociodemographic and lifestyle variables. We conducted five runs of tenfold cross-validation on the data. We combined the results of these runs for each combination of feature selection (e.g., Information Gain) and classification (e.g., Support Vector Machines) algorithms. Results The XGBoost classifier had the highest accuracy in predicting H. pylori infection status with an accuracy of 77%-a 13% improvement from the baseline accuracy of guessing the most frequent class (64% of the samples were H. Pylori negative.) K-Nearest Neighbors showed the worst performance across all classifiers. A similar performance was observed using the F1-score and area under the receiver operating curve (AUROC) classifier evaluation metrics. Among all features, place of residence (with urban residence increasing risk) was the most common risk factor for H. pylori infection, regardless of the feature selection method choice. Additionally, our machine learning algorithms identified other important risk factors for H. pylori infection, such as; electricity usage in the home, toilet type, and waste disposal location. Using a 75% cutoff for robustness, machine learning identified five of the eight significant features found by traditional multivariate logistic regression. However, when a lower robustness threshold is used, machine learning approaches identified more H. pylori risk factors than multivariate logistic regression and suggested risk factors not detected by logistic regression. Conclusion This study provides evidence that machine learning approaches are positioned to uncover H. pylori infection risk factors and predict H. pylori infection status. These approaches identify similar risk factors and predict infection with comparable accuracy to logistic regression, thus they could be used as an alternative method.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A machine learning-based universal outbreak risk prediction tool
    Zhang, Tianyu
    Rabhi, Fethi
    Chen, Xin
    Paik, Hye-young
    Macintyre, Chandini Raina
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
  • [42] A machine learning-based classification approach for phase diagram prediction
    Deffrennes, Guillaume
    Terayama, Kei
    Abe, Taichi
    Tamura, Ryo
    MATERIALS & DESIGN, 2022, 215
  • [43] Attributable risk of H. pylori in peptic ulcer disease -: Does declining prevalence of infection in general population explain increasing frequency of non-H. pylori ulcers?
    Sugiyama, T
    Nishikawa, K
    Komatsu, Y
    Ishizuka, J
    Mizushima, T
    Kumagai, A
    Kato, M
    Saito, N
    Takeda, H
    Asaka, M
    Freston, JW
    DIGESTIVE DISEASES AND SCIENCES, 2001, 46 (02) : 307 - 310
  • [44] Frequency distribution, Histopathology of H. pylori among dyspeptic patients from Peshawar and associated risk factors analysis
    Akbar, Muhammad Taj
    Faisal, Shah
    Khan, Shahzar
    Shah, Sajjad Ali
    Shah, Sumaira
    Ali, Fawad
    Shah, Hameed Ullah
    Shazeb
    Ullah, Rafi
    Abbas, Syed Hamza
    BIOSCIENCE RESEARCH, 2021, 18 (03): : 2301 - 2305
  • [45] Helicobacter pylori infection and gender:: A meta-analysis of population-based prevalence surveys
    de Martel, Catherine
    Parsonnet, Julie
    DIGESTIVE DISEASES AND SCIENCES, 2006, 51 (12) : 2292 - 2301
  • [46] Analysis of Clinical Phenotypes through Machine Learning of First-Line H. pylori Treatment in Europe during the Period 2013-2022: Data from the European Registry on H. pylori Management (Hp-EuReg)
    Nyssen, Olga. P.
    Pratesi, Pietro
    Spinola, Miguel. A.
    Jonaitis, Laimas
    Perez-Aisa, Angeles
    Vaira, Dino
    Saracino, Ilaria Maria
    Pavoni, Matteo
    Fiorini, Giulia
    Tepes, Bojan
    Bordin, Dmitry. S.
    Voynovan, Irina
    Lanas, Angel
    Martinez-Dominguez, Samuel. J.
    Alfaro, Enrique
    Bujanda, Luis
    Pabon-Carrasco, Manuel
    Hernandez, Luis
    Gasbarrini, Antonio
    Kupcinskas, Juozas
    Lerang, Frode
    Smith, Sinead. M.
    Gridnyev, Oleksiy
    Leja, Marcis
    Rokkas, Theodore
    Marcos-Pinto, Ricardo
    Mestrovic, Antonio
    Marlicz, Wojciech
    Milivojevic, Vladimir
    Simsek, Halis
    Kunovsky, Lumir
    Papp, Veronika
    Phull, Perminder. S.
    Venerito, Marino
    Boyanova, Lyudmila
    Boltin, Doron
    Niv, Yaron
    Matysiak-Budnik, Tamara
    Doulberis, Michael
    Dobru, Daniela
    Lamy, Vincent
    Capelle, Lisette. G.
    Trpchevska, Emilijia Nikolovska
    Moreira, Leticia
    Cano-Catalia, Anna
    Parra, Pablo
    Megraud, Francis
    O'Morain, Colm
    Ortega, Guillermo. J.
    Gisbert, Javier. P.
    ANTIBIOTICS-BASEL, 2023, 12 (09):
  • [47] RisklnDroid: Machine Learning-Based Risk Analysis on Android
    Merlo, Alessio
    Georgiu, Gabriel Claudiu
    ICT SYSTEMS SECURITY AND PRIVACY PROTECTION, SEC 2017, 2017, 502 : 538 - 552
  • [48] A Machine Learning-Based Lexicon Approach for Sentiment Analysis
    Sahu, Tirath Prasad
    Khandekar, Sarang
    INTERNATIONAL JOURNAL OF TECHNOLOGY AND HUMAN INTERACTION, 2020, 16 (02) : 8 - 22
  • [49] Developing a machine learning-based flood risk prediction model for the Indus Basin in Pakistan
    Khan, Mehran
    Khan, Afed Ullah
    Ullah, Basir
    Khan, Sunaid
    WATER PRACTICE AND TECHNOLOGY, 2024, 19 (06) : 2213 - 2225
  • [50] Identification of Risk Factors and Machine Learning-Based Prediction Models for Knee Osteoarthritis Patients
    Kokkotis, Christos
    Moustakidis, Serafeim
    Giakas, Giannis
    Tsaopoulos, Dimitrios
    APPLIED SCIENCES-BASEL, 2020, 10 (19):