XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study

被引:2
作者
Fan, Zijuan [1 ,2 ]
Song, Wenzhu [3 ]
Ke, Yan [4 ]
Jia, Ligan [5 ]
Li, Songyan [1 ]
Li, Jiao Jiao [6 ]
Zhang, Yuqing [7 ]
Lin, Jianhao [4 ]
Wang, Bin [1 ]
机构
[1] Zhejiang Univ, Sch Med, Affiliated Hosp 1, Dept Orthopaed Surg, Qingchun Rd 79, Hangzhou, Peoples R China
[2] Sun Yat Sen Univ, Sch Publ Hlth, Dept Hlth Stat, Guangzhou, Peoples R China
[3] Zhejiang Univ Sch Med, Dept Big Data Hlth Sci, Sch Med, Sch Publ Hlth, Hangzhou, Zhejiang, Peoples R China
[4] Peking Univ, Peoples Hosp, Arthrit Clin & Res Ctr, Beijing, Peoples R China
[5] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi, Peoples R China
[6] Univ Technol Sydney, Fac Engn & IT, Sch Biomed Engn, Sydney, Australia
[7] Harvard Med Sch, Boston, MA USA
基金
中国国家自然科学基金;
关键词
Knee osteoarthritis; Classification; XGBoost; Boruta; SHAP; Machine learning; CHINESE POPULATION; EXPLAINABLE AI; PREVALENCE; HIP; GAIT;
D O I
10.1186/s13075-024-03450-2
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
ObjectiveTo use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features.MethodsIn this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features.ResultsA total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features.ConclusionsOur study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.
引用
收藏
页数:16
相关论文
共 71 条
[1]   Predicting knee osteoarthritis severity: comparative modeling based on patient's data and plain X-ray images [J].
Abedin, Jaynal ;
Antony, Joseph ;
McGuinness, Kevin ;
Moran, Kieran ;
O'Connor, Noel E. ;
Rebholz-Schuhmann, Dietrich ;
Newell, John .
SCIENTIFIC REPORTS, 2019, 9 (1)
[2]   Protein oxidation, nitration and glycation biomarkers for early-stage diagnosis of osteoarthritis of the knee and typing and progression of arthritic disease [J].
Ahmed, Usman ;
Anwar, Attia ;
Savage, Richard S. ;
Thornalley, Paul J. ;
Rabbani, Naila .
ARTHRITIS RESEARCH & THERAPY, 2016, 18
[3]   The Bayesian adaptive lasso regression [J].
Alhamzawi, Rahim ;
Ali, Haithem Taha Mohammad .
MATHEMATICAL BIOSCIENCES, 2018, 303 :75-82
[4]   Use of machine learning in osteoarthritis research: a systematic literature review [J].
Binvignat, Marie ;
Pedoia, Valentina ;
Butte, Atul J. ;
Louati, Karine ;
Klatzmann, David ;
Berenbaum, Francis ;
Mariotti-Ferrandiz, Encarnita ;
Sellam, Jeremie .
RMD OPEN, 2022, 8 (01)
[5]   Effect of walking with a modified gait on activation patterns of the knee spanning muscles in people with medial knee osteoarthritis [J].
Booij, M. J. ;
Richards, R. ;
Harlaar, J. ;
van den Noort, J. C. .
KNEE, 2020, 27 (01) :198-206
[6]   Diagnosis of osteoarthritis: Imaging [J].
Braun, Hillary J. ;
Gold, Garry E. .
BONE, 2012, 51 (02) :278-288
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Racial/Ethnic, Socioeconomic, and Geographic Disparities in the Epidemiology of Knee and Hip Osteoarthritis [J].
Callahan, Leigh F. ;
Cleveland, Rebecca J. ;
Allen, Kelli D. ;
Golightly, Yvonne .
RHEUMATIC DISEASE CLINICS OF NORTH AMERICA, 2021, 47 (01) :1-20
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]  
Chen T., 2016, A scalable tree boosting system