Machine Learning-Based Risk Prediction of Discharge Status for Sepsis

被引:0
作者
Cai, Kaida [1 ,2 ]
Lou, Yuqing [2 ]
Wang, Zhengyan [2 ]
Yang, Xiaofang [2 ]
Zhao, Xin [2 ,3 ]
机构
[1] Southeast Univ, Sch Publ Hlth, Nanjing 210009, Peoples R China
[2] Southeast Univ, Sch Math, Nanjing 210009, Peoples R China
[3] Southeast Univ, Key Lab Measurement & Control Complex Syst Engn, Minist Educ, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
machine learning; feature selection; information gain; missing data imputation; sepsis; INTERNATIONAL CONSENSUS DEFINITIONS; IMPUTATION; MORTALITY; SELECTION;
D O I
10.3390/e26080625
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
As a severe inflammatory response syndrome, sepsis presents complex challenges in predicting patient outcomes due to its unclear pathogenesis and the unstable discharge status of affected individuals. In this study, we develop a machine learning-based method for predicting the discharge status of sepsis patients, aiming to improve treatment decisions. To enhance the robustness of our analysis against outliers, we incorporate robust statistical methods, specifically the minimum covariance determinant technique. We utilize the random forest imputation method to effectively manage and impute missing data. For feature selection, we employ Lasso penalized logistic regression, which efficiently identifies significant predictors and reduces model complexity, setting the stage for the application of more complex predictive methods. Our predictive analysis incorporates multiple machine learning methods, including random forest, support vector machine, and XGBoost. We compare the prediction performance of these methods with Lasso penalized logistic regression to identify the most effective approach. Each method's performance is rigorously evaluated through ten iterations of 10-fold cross-validation to ensure robust and reliable results. Our comparative analysis reveals that XGBoost surpasses the other models, demonstrating its exceptional capability to navigate the complexities of sepsis data effectively.
引用
收藏
页数:12
相关论文
共 38 条
[1]  
Agresti A, 2013, CATEGORICAL DATA ANA
[2]  
Azhagusundari B., 2013, International Journal of Innovative Technology and Exploring Engineering (IJITEE), V2, P18, DOI DOI 10.1371/JOURNAL.PONE.0166017
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[6]   Rank-sum tests for clustered data [J].
Datta, S ;
Satten, GA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (471) :908-915
[7]  
Fan JQ, 2010, STAT SINICA, V20, P101
[8]   Assessment of Global Incidence and Mortality of Hospital-treated Sepsis [J].
Fleischmann, Carolin ;
Scherag, Andre ;
Adhikari, Neill K. J. ;
Hartog, Christiane S. ;
Tsaganos, Thomas ;
Schlattmann, Peter ;
Angus, Derek C. ;
Reinhart, Konrad .
AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2016, 193 (03) :259-272
[9]   Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy [J].
Fleuren, Lucas M. ;
Klausch, Thomas L. T. ;
Zwager, Charlotte L. ;
Schoonmade, Linda J. ;
Guo, Tingjie ;
Roggeveen, Luca F. ;
Swart, Eleonora L. ;
Girbes, Armand R. J. ;
Thoral, Patrick ;
Ercole, Ari ;
Hoogendoorn, Mark ;
Elbers, Paul W. G. .
INTENSIVE CARE MEDICINE, 2020, 46 (03) :383-400
[10]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22