Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique

被引:0
作者
Alizargar, Azadeh [1 ]
Chang, Yang-Lang [1 ]
Tan, Tan-Hsu [1 ]
Liu, Tsung-Yu [2 ]
机构
[1] Natl Taipei Univ Technol, Coll Elect Engn & Comp Sci, Dept Elect Engn, Taipei 10608, Taiwan
[2] Lunghwa Univ Sci & Technol, Dept Multimedia & Game Sci, Taoyuan 333326, Taiwan
关键词
Index terms- Hepatitis B; Liver damage; Early detection; Machine learning; Ensemble model; SMOTE; RISK; DIAGNOSIS; VIRUS;
D O I
10.1007/s12553-023-00802-x
中图分类号
R-058 [];
学科分类号
摘要
PurposeHepatitis B, caused by the Hepatitis B virus (HBV), can harm the liver without noticeable symptoms. Early detection is crucial to prevent transmission and enhance recovery. The main goal is to predict Hepatitis B through cost-effective lab test data, by utilizing machine learning. The primary focus is on evaluating the effectiveness of various algorithms in predicting the disease and their potential to enhance early diagnosis capabilities.MethodsSix distinct algorithms (Support Vector Machine, K-nearest Neighbors, Logistic Regression, decision tree, extreme gradient boosting, random forest) were employed alongside an ensemble model. Analysis involved two rounds: considering all features and key attributes. The Synthetic Minority Oversampling Technique (SMOTE) was employed for data imbalance. Various metrics, including the confusion matrix, precision, recall, F1 score, accuracy, receiver operating characteristics (ROC) curve, area under the curve (AUC), and mean absolute error (MAE), were utilized to assess the efficacy of each predictive technique. The National Health and Nutrition Examination Survey (NHANES) dataset was employed.ResultsThe experimental results demonstrate that the ensemble model attained the highest accuracy (97%) and AUC (0.997) in comparison to existing models. The analysis revealed that specific crucial features possess substantial predictive significance within this model.ConclusionThe study underscores the potential of the ensemble model as a valuable tool for medical practitioners, leveraging cost-effective and readily obtainable laboratory test data to predict Hepatitis B with remarkable accuracy. By facilitating early diagnosis and intervention, this research presents a promising avenue to enhance patient outcomes in the context of Hepatitis B.
引用
收藏
页码:109 / 118
页数:10
相关论文
共 50 条
[31]   A comparative ensemble approach to bedload prediction using metaheuristic machine learning [J].
Mir, Ajaz Ahmad ;
Patel, Mahesh ;
Albalawi, Fahad ;
Bajaj, Mohit ;
Tuka, Milkias Berhanu .
SCIENTIFIC REPORTS, 2024, 14 (01)
[32]   Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier [J].
R. Geetha ;
S. Sivasubramanian ;
M. Kaliappan ;
S. Vimal ;
Suresh Annamalai .
Journal of Medical Systems, 2019, 43
[33]   Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier [J].
Geetha, R. ;
Sivasubramanian, S. ;
Kaliappan, M. ;
Vimal, S. ;
Annamalai, Suresh .
JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (09)
[34]   Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction [J].
Ozupak, Yildirim ;
Alpsalaz, Feyyaz ;
Aslan, Emrah .
WATER AIR AND SOIL POLLUTION, 2025, 236 (07)
[35]   A Review: Machine Learning and Data Mining Approaches for Cardiovascular Disease Diagnosis and Prediction [J].
Rao G.S. ;
Muneeswari G. .
EAI Endorsed Transactions on Pervasive Health and Technology, 2024, 10
[36]   Improving the performance of machine learning model predicting phase and crystal structure of high entropy alloys by the synthetic minority oversampling technique [J].
Hareharen, K. ;
Panneerselvam, T. ;
Mohan, R. Raj .
JOURNAL OF ALLOYS AND COMPOUNDS, 2024, 991
[37]   Enterprise credit risk prediction using supply chain information: A decision tree ensemble model based on the differential sampling rate, Synthetic Minority Oversampling Technique and AdaBoost [J].
Yao, Gang ;
Hu, Xiaojian ;
Zhou, Taiyun ;
Zhang, Yue .
EXPERT SYSTEMS, 2022, 39 (06)
[38]   Probabilistic prediction of uniaxial compressive strength for rocks from sparse data using Bayesian Gaussian process regression with Synthetic Minority Oversampling Technique (SMOTE) [J].
Song, Chao ;
Zhao, Tengyuan ;
Xu, Ling ;
Huang, Xiaolin .
COMPUTERS AND GEOTECHNICS, 2024, 165
[39]   Prediction of LDL in hypertriglyceridemic subjects using an innovative ensemble machine learning technique [J].
Demirci, Ferhat ;
Emec, Murat ;
Doruk, Ozlem Gursoy ;
Ozcanhan, Mehmet Hilal ;
Ormen, Murat ;
Akan, Pinar .
TURKISH JOURNAL OF BIOCHEMISTRY-TURK BIYOKIMYA DERGISI, 2024, 48 (06) :641-652
[40]   Prediction of an educational institute learning environment using machine learning and data mining [J].
Shoaib, Muhammad ;
Sayed, Nasir ;
Amara, Nedra ;
Latif, Abdul ;
Azam, Sikandar ;
Muhammad, Sajjad .
EDUCATION AND INFORMATION TECHNOLOGIES, 2022, 27 (07) :9099-9123