Comparative study on the performance of different classification algorithms, combined with pre- and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemia

被引:11
作者
Albuquerque, Joao [1 ,2 ,3 ]
Medeiros, Ana Margarida [3 ,4 ]
Alves, Ana Catarina [3 ,4 ]
Bourbon, Mafalda [3 ,4 ]
Antunes, Marilia [2 ,5 ]
机构
[1] Univ Porto, Fac Med, Dept Biomed, Unidade Bioquim, Porto, Portugal
[2] Univ Lisbon, Fac Ciencias, Ctr Estat & Aplicacoes, Lisbon, Portugal
[3] Inst Nacl Saude Doutor Ricardo Jorge, Dept Promocao Saude & Prevencao Doencas Nao Transm, Grp Invest Cardiovasc, Lisbon, Portugal
[4] Univ Lisbon, Fac Ciencias, Inst Biossistemas & Ciencias Integrat, Lisbon, Portugal
[5] Univ Lisbon, Fac Ciencias, Dept Estat & Invest Operac, Lisbon, Portugal
关键词
PREVALENCE; VALIDATION;
D O I
10.1371/journal.pone.0269713
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Familial Hypercholesterolemia (FH) is an inherited disorder of cholesterol metabolism. Current criteria for FH diagnosis, like Simon Broome (SB) criteria, lead to high false positive rates. The aim of this work was to explore alternative classification procedures for FH diagnosis, based on different biological and biochemical indicators. For this purpose, logistic regression (LR), naive Bayes classifier (NB), random forest (RF) and extreme gradient boosting (XGB) algorithms were combined with Synthetic Minority Oversampling Technique (SMOTE), or threshold adjustment by maximizing Youden index (YI), and compared. Data was tested through a 10 x 10 repeated k-fold cross validation design. The LR model presented an overall better performance, as assessed by the areas under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves, and several operating characteristics (OC), regardless of the strategy to cope with class imbalance. When adopting either data processing technique, significantly higher accuracy (Acc), G-mean and F-1 score values were found for all classification algorithms, compared to SB criteria (p < 0.01), revealing a more balanced predictive ability for both classes, and higher effectiveness in classifying FH patients. Adjustment of the cut-off values through pre or post-processing methods revealed a considerable gain in sensitivity (Sens) values (p < 0.01). Although the performance of pre and post-processing strategies was similar, SMOTE does not cause model's parameters to loose interpretability. These results suggest a LR model combined with SMOTE can be an optimal approach to be used as a widespread screening tool.
引用
收藏
页数:19
相关论文
共 54 条
[1]   Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care [J].
Akyea, Ralph K. ;
Qureshi, Nadeem ;
Kai, Joe ;
Weng, Stephen F. .
NPJ DIGITAL MEDICINE, 2020, 3 (01)
[2]  
Albuquerque J., 2020, J STAT HLTH DEC, V2, P1
[3]  
[Anonymous], 1991, BMJ, V303, P893
[4]  
[Anonymous], 2000, P AAAI 2000 WORKSH I
[5]  
[Anonymous], 2021, IEEE Trans. Broadcast.
[6]   Genetic causes of monogenic heterozygous familial hypercholesterolemia: A HuGE prevalence review [J].
Austin, MA ;
Hutter, CM ;
Zimmern, RL ;
Humphries, SE .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2004, 160 (05) :407-420
[7]   Finding missed cases of familial hypercholesterolemia in health systems using machine learning [J].
Banda, Juan M. ;
Sarraju, Ashish ;
Abbasi, Fahim ;
Parizo, Justin ;
Pariani, Mitchel ;
Ison, Hannah ;
Briskin, Elinor ;
Wand, Hannah ;
Dubois, Sebastien ;
Jung, Kenneth ;
Myers, Seth A. ;
Rader, Daniel J. ;
Leader, Joseph B. ;
Murray, Michael F. ;
Myers, Kelly D. ;
Wilemon, Katherine ;
Shah, Nigam H. ;
Knowles, Joshua W. .
NPJ DIGITAL MEDICINE, 2019, 2 (1)
[8]   Worldwide Prevalence of Familial Hypercholesterolemia Meta-Analyses of 11 Million Subjects [J].
Beheshti, Sabina O. ;
Madsen, Christian M. ;
Varbo, Anette ;
Nordestgaard, Birge G. .
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2020, 75 (20) :2553-2566
[9]   The importance of an integrated analysis of clinical, molecular, and functional data for the genetic diagnosis of familial hypercholesterolemia [J].
Benito-Vicente, Asier ;
Alves, Ana Catarina ;
Etxebarria, Aitor ;
Medeiros, Ana Medeiros ;
Martin, Cesar ;
Bourbon, Mafalda .
GENETICS IN MEDICINE, 2015, 17 (12) :980-988
[10]   Familial Hypercholesterolemia in the Danish General Population: Prevalence, Coronary Artery Disease, and Cholesterol-Lowering Medication [J].
Benn, Marianne ;
Watts, Gerald F. ;
Tybjaerg-Hansen, Anne ;
Nordestgaard, Borge G. .
JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM, 2012, 97 (11) :3956-3964