Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach

被引:18
作者
Sanchez-Hernandez, Fernando [1 ]
Carlos Ballesteros-Herraez, Juan [2 ]
Kraiem, Mohamed S. [3 ]
Sanchez-Barba, Mercedes [4 ]
Moreno-Garcia, Maria N. [3 ]
机构
[1] Univ Salamanca, Fac Nursing & Physiotherapy, Salamanca 37007, Spain
[2] Univ Hosp Salamanca, Intens Care Unit, Salamanca 37007, Spain
[3] Univ Salamanca, Dept Comp & Automat, E-37008 Salamanca, Spain
[4] Univ Salamanca, Dept Stat, Salamanca 37007, Spain
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 24期
关键词
ensemble classifiers; healthcare-associated infections; ICU infections; imbalanced data; machine learning; oversampling; undersampling; DECISION TREES; CLASSIFICATION; SYSTEM; RULES;
D O I
10.3390/app9245287
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.
引用
收藏
页数:26
相关论文
共 55 条
[1]   Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques [J].
Abdoh, Sherif F. ;
Rizka, Mohamed Abo ;
Maghraby, Fahima A. .
IEEE ACCESS, 2018, 6 :59475-59485
[2]   Feature Engineering for ICU Mortality Prediction Based on Hourly to Bi-Hourly Measurements [J].
Amer, Ahmed Y. A. ;
Vranken, Julie ;
Wouters, Femke ;
Mesotten, Dieter ;
Vandervoort, Pieter ;
Storms, Valerie ;
Luca, Stijn ;
Vanrumste, Bart ;
Aerts, Jean-Marie .
APPLIED SCIENCES-BASEL, 2019, 9 (17)
[3]  
[Anonymous], 2009, ACM SIGKDD explorations newsletter, DOI 10.1145/1656274.1656278
[4]  
[Anonymous], 2007, ICML
[5]   Neutrosophic rule-based prediction system for toxicity effects assessment of biotransformed hepatic drugs [J].
Basha, Sameh H. ;
Tharwat, Alaa ;
Abdalla, Areeg ;
Hassanien, Aboul Ella .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 :142-157
[6]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[7]   Exploring the Response Shift Effect on the Quality of Life of Patients with Schizophrenia: An Application of the Random Forest Method [J].
Boucekine, Mohamed ;
Boyer, Laurent ;
Baumstarck, Karine ;
Millier, Aurelie ;
Ghattas, Badih ;
Auquier, Pascal ;
Toumi, Mondher .
MEDICAL DECISION MAKING, 2015, 35 (03) :388-397
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43