An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics

被引:0
作者
Zhou, Pei-Yuan [1 ]
Lum, Faith [1 ]
Wang, Tony Jiecao [1 ]
Bhatti, Anubhav [2 ]
Parmar, Surajsinh [2 ]
Dan, Chen [2 ]
Wong, Andrew K. C. [1 ]
机构
[1] Univ Waterloo, Dept Syst Design Engn, Waterloo, ON N2L 3G1, Canada
[2] SpassMed Inc, AI Engn Team, Toronto, ON M5H 2S6, Canada
来源
BIOENGINEERING-BASEL | 2024年 / 11卷 / 08期
基金
加拿大自然科学与工程研究理事会;
关键词
unsupervised learning; error detection; pattern discovery and disentanglement; healthcare data analysis;
D O I
10.3390/bioengineering11080770
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Medical datasets may be imbalanced and contain errors due to subjective test results and clinical variability. The poor quality of original data affects classification accuracy and reliability. Hence, detecting abnormal samples in the dataset can help clinicians make better decisions. In this study, we propose an unsupervised error detection method using patterns discovered by the Pattern Discovery and Disentanglement (PDD) model, developed in our earlier work. Applied to the large data, the eICU Collaborative Research Database for sepsis risk assessment, the proposed algorithm can effectively discover statistically significant association patterns, generate an interpretable knowledge base for interpretability, cluster samples in an unsupervised learning manner, and detect abnormal samples from the dataset. As shown in the experimental result, our method outperformed K-Means by 38% on the full dataset and 47% on the reduced dataset for unsupervised clustering. Multiple supervised classifiers improve accuracy by an average of 4% after removing abnormal samples by the proposed error detection approach. Therefore, the proposed algorithm provides a robust and practical solution for unsupervised clustering and error detection in healthcare data.
引用
收藏
页数:22
相关论文
共 34 条
  • [1] Leveraging a 7-Layer Long Short-Term Memory Model for Early Detection and Prevention of Diabetes in Oman: An Innovative Approach
    Al Sadi, Khoula
    Balachandran, Wamadeva
    [J]. BIOENGINEERING-BASEL, 2024, 11 (04):
  • [2] Explainability for artificial intelligence in healthcare: a multidisciplinary perspective
    Amann, Julia
    Blasimme, Alessandro
    Vayena, Effy
    Frey, Dietmar
    Madai, Vince I.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [3] An Explainable Machine-Learning Model for Compensatory Reserve Measurement: Methods for Feature Selection and the Effects of Subject Variability
    Bedolla, Carlos N.
    Gonzalez, Jose M.
    Vega, Saul J.
    Convertino, Victor A.
    Snider, Eric J.
    [J]. BIOENGINEERING-BASEL, 2023, 10 (05):
  • [4] Brain Tumor Detection and Categorization with Segmentation of Improved Unsupervised Clustering Approach and Machine Learning Classifier
    Bhimavarapu, Usharani
    Chintalapudi, Nalini
    Battineni, Gopi
    [J]. BIOENGINEERING-BASEL, 2024, 11 (03):
  • [5] Caponetto R., 1993, Transactions of the Institute of Measurement and Control, V15, P143, DOI 10.1177/014233129301500305
  • [6] Dellinger RP, 2013, INTENS CARE MED, V39, P165, DOI [10.1007/s00134-012-2769-8, 10.1097/CCM.0b013e31827e83af]
  • [7] Everitt B.S., 2010, The cambridge dictionary of statistics, Vfourth
  • [8] Extraction of Interpretable Multivariate Patterns for Early Diagnostics
    Ghalwash, Mohamed F.
    Radosavljevic, Vladan
    Obradovic, Zoran
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 201 - 210
  • [9] Learning from class-imbalanced data: Review of methods and applications
    Guo Haixiang
    Li Yijing
    Shang, Jennifer
    Gu Mingyun
    Huang Yuanyue
    Bing, Gong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 220 - 239
  • [10] Hilarius KWE, 2020, PEDIATR EMERG CARE, V36, P101, DOI 10.1097/PEC.0000000000002043