A machine learning approach in a monocentric cohort for predicting primary refractory disease in Diffuse Large B-cell lymphoma patients

被引:2
作者
Detrait, Marie Y. [1 ]
Warnon, Stephanie [2 ]
Lagasse, Raphael [1 ,3 ,4 ]
Dumont, Laurent [1 ]
De Prophetis, Stephanie [5 ]
Hansenne, Amandine [5 ]
Raedemaeker, Juliette [5 ]
Robin, Valerie [5 ]
Verstraete, Geraldine [5 ]
Gillain, Aline [2 ]
Depasse, Nicolas [1 ]
Jacmin, Pierre [1 ]
Pranger, Delphine [5 ]
机构
[1] Grand Hop Charleroi, Dept Technol & Informat Syst, Charleroi, Belgium
[2] Grand Hop Charleroi, Dept Clin Res, Charleroi, Belgium
[3] Grand Hop Charleroi, Dept Medicoecon Informat, Charleroi, Belgium
[4] Univ Libre Bruxelles ULB, Sch Publ Hlth, Brussels, Belgium
[5] Grand Hop Charleroi, Oncol Dept, Div Hematol Hematol, Charleroi, Belgium
关键词
PRECISION; PROGNOSIS;
D O I
10.1371/journal.pone.0311261
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Introduction Primary refractory disease affects 30-40% of patients diagnosed with DLBCL and is a significant challenge in disease management due to its poor prognosis. Predicting refractory status could greatly inform treatment strategies, enabling early intervention. Various options are now available based on patient and disease characteristics. Supervised machine-learning techniques, which can predict outcomes in a medical context, appear highly suitable for this purpose. Design Retrospective monocentric cohort study. Patient population Adult patients with a first diagnosis of DLBCL admitted to the hematology unit from 2017 to 2022. Aim We evaluated in our Center five supervised machine-learning (ML) models as a tool for the prediction of primary refractory DLBCL. Main results One hundred and thirty patients with Diffuse Large B-cell lymphoma (DLBCL) were included in this study between January 2017 and December 2022. The variables used for analysis included demographic characteristics, clinical condition, disease characteristics, first-line therapy and PET-CT scan realization after 2 cycles of treatment. We compared five supervised ML models: support vector machine (SVM), Random Forest Classifier (RFC), Logistic Regression (LR), Na & iuml;ve Bayes (NB) Categorical classifier and eXtreme Gradient Boost (XGboost), to predict primary refractory disease. The performance of these models was evaluated using the area under the receiver operating characteristic curve (ROC-AUC), accuracy, false positive rate, sensitivity, and F1-score to identify the best model. After a median follow-up of 19.5 months, the overall survival rate was 60% in the cohort. The Overall Survival at 3 years was 58.5% (95%CI, 51-68.5) and the 3-years Progression Free Survival was 63% (95%CI, 54-71) using Kaplan-Meier method. Of the 124 patients who received a first line treatment, primary refractory disease occurred in 42 patients (33.8%) and 2 patients (1.6%) experienced relapse within 6 months. The univariate analysis on refractory disease status shows age (p = 0.009), Ann Arbor stage (p = 0.013), CMV infection (p = 0.012), comorbidity (p = 0.019), IPI score (p<0.001), first line of treatment (p<0.001), EBV infection (p = 0.008) and socio-economics status (p = 0.02) as influencing factors. The NB Categorical classifier emerged as the top-performing model, boasting a ROC-AUC of 0.81 (95% CI, 0.64-0.96), an accuracy of 83%, a F1-score of 0.82, and a low false positive rate at 10% on the validation set. The eXtreme Gradient Boost (XGboost) model and the Random Forest Classifier (RFC) followed with a ROC-AUC of 0.74 (95%CI, 0.52-0.93) and 0.67 (95%CI, 0.46-0.88) respectively, an accuracy of 78% and 72% respectively, a F1-score of 0.75 and 0.67 respectively, and a false positive rate of 10% for both. The other two models performed worse with ROC-AUC of 0.65 (95%CI, 0.40-0.87) and 0.45 (95%CI, 0.29-0.64) for SVM and LR respectively, an accuracy of 67% and 50% respectively, a f1-score of 0.64 and 0.43 respectively, and a false positive rate of 28% and 37% respectively. Conclusion Machine learning algorithms, particularly the NB Categorical classifier, have the potential to improve the prediction of primary refractory disease in DLBCL patients, thereby providing a novel decision-making tool for managing this condition. To validate these results on a broader scale, multicenter studies are needed to confirm the results in larger cohorts.
引用
收藏
页数:19
相关论文
共 38 条
[1]   Electronic Patient-Reported Data Capture as a Foundation of Rapid Learning Cancer Care [J].
Abernethy, Amy P. ;
Ahmad, Asif ;
Zafar, S. Yousuf ;
Wheeler, Jane L. ;
Reese, Jennifer Barsky ;
Lyerly, H. Kim .
MEDICAL CARE, 2010, 48 (06) :S32-S38
[2]   Bayesian Networks for Risk Prediction Using Real-World Data: A Tool for Precision Medicine [J].
Arora, Paul ;
Boyne, Devon ;
Slater, Justin J. ;
Gupta, Alind ;
Brenner, Darren R. ;
Druzdzel, Marek J. .
VALUE IN HEALTH, 2019, 22 (04) :439-445
[3]   Big Data and Machine Learning in Health Care [J].
Beam, Andrew L. ;
Kohane, Isaac S. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (13) :1317-1318
[4]   Electronic patient-reported outcome systems in oncology clinical practice [J].
不详 .
CA-A CANCER JOURNAL FOR CLINICIANS, 2012, 62 (05) :336-347
[5]   A naive Bayes classifier for planning transfusion requirements in heart surgery [J].
Cevenini, Gabriele ;
Barbini, Emanuela ;
Massai, Maria R. ;
Barbini, Paolo .
JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2013, 19 (01) :25-29
[6]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[7]   Outcomes in refractory diffuse large B-cell lymphoma: results from the international SCHOLAR-1 study [J].
Crump, Michael ;
Neelapu, Sattva S. ;
Farooq, Umar ;
Van den Neste, Eric ;
Kuruvilla, John ;
Westin, Jason ;
Link, Brian K. ;
Hay, Annette ;
Cerhan, James R. ;
Zhu, Liting ;
Boussetta, Sami ;
Feng, Lei ;
Maurer, Matthew J. ;
Navale, Lynn ;
Wiezorek, Jeff ;
Go, William Y. ;
Gisselbrecht, Christian .
BLOOD, 2017, 130 (16) :1800-1808
[8]  
Davidson-Pilon C., 2019, JOSS, V4, P1317, DOI DOI 10.21105/JOSS.01317
[9]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[10]   Interim FDG PET/CT as a prognostic factor in diffuse large B-cell lymphoma [J].
Fuertes, Silvia ;
Setoain, Xavier ;
Lopez-Guillermo, Armando ;
Carrasco, Josep-Lluis ;
Rodriguez, Sonia ;
Rovira, Jordina ;
Pons, Francesca .
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2013, 40 (04) :496-504