Missing data techniques in classification for cardiovascular dysautonomias diagnosis

被引:0
作者
Ali Idri
Ilham Kadi
Ibtissam Abnane
José Luis Fernandez-Aleman
机构
[1] Mohammed V University,Software Project Management Research Team
[2] Mohammed VI Polytechnic University,CSEHS
[3] University of Murcia,MSDA
来源
Medical & Biological Engineering & Computing | 2020年 / 58卷
关键词
Missing data; KNN imputation; Missingness mechanism; Cardiology;
D O I
暂无
中图分类号
学科分类号
摘要
Missing data (MD) is a common and inevitable problem facing data mining (DM)–based decision systems in e-health since many medical historical datasets contain a huge number of missing values. Therefore, a pre-processing stage is usually required to deal with missing values before building any DM–based decision system. The purpose of this paper is to evaluate the impact of MD techniques on classification systems in cardiovascular dysautonomias diagnosis. We analyzed and compared the accuracy rates of four classification techniques: random forest (RF), support vector machines (SVM), C4.5 decision tree, and Naive Bayes (NB), using two MD techniques: deletion or imputation with k-nearest neighbors (KNN). A total of 216 experiments were therefore carried out using three missingness mechanisms (MCAR: missing completely at random, MAR: missing at random and NMAR: not missing at random), two MD techniques (deletion and KNN imputation), nine MD percentages from 10 to 90% over a dataset collected from the autonomic nervous system (ANS) unit of the University Hospital Avicenne in Morocco. The results obtained suggest that using KNN imputation rather than deletion enhances the accuracy rates of the four classifiers. Moreover, the MD percentages have a negative impact on the performance of classification techniques regardless of the MD mechanisms and MD techniques used. In fact, the accuracy rates of the four classifiers decrease as the MD percentage increases.
引用
收藏
页码:2863 / 2878
页数:15
相关论文
共 50 条
  • [1] Missing data techniques in classification for cardiovascular dysautonomias diagnosis
    Idri, Ali
    Kadi, Ilham
    Abnane, Ibtissam
    Fernandez-Aleman, Jose Luis
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (11) : 2863 - 2878
  • [2] Techniques to Deal with Missing Data
    Sessa, Jadran
    Syed, Dabeeruddin
    2016 5TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA), 2016,
  • [3] Machine Learning Techniques for Solving Classification Problems with Missing Input Data
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, 2008, : 12 - +
  • [4] Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study
    Campos, Sergio
    Pizarro, Luis
    Valle, Carlos
    Gray, Katherine R.
    Rueckert, Daniel
    Allende, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 3 - 10
  • [5] On classification with nonignorable missing data
    Mojirsheibani, Majid
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 184
  • [6] NONPARAMETRIC CLASSIFICATION WITH MISSING DATA
    Sell, Torben
    Berrett, Thomas b.
    Cannings, Timothy i.
    ANNALS OF STATISTICS, 2024, 52 (03) : 1178 - 1200
  • [7] Missing Data Techniques for Factor Analysis
    Wang, Hong-Long
    Yang, Meng-Li
    Chen, Chun-Ju
    Lin, Ting-Hsiang
    JOURNAL OF RESEARCH IN EDUCATION SCIENCES, 2012, 57 (01): : 29 - 50
  • [8] Nearest Subspace Classification with Missing Data
    Chi, Yuejie
    2013 ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2013, : 1667 - 1671
  • [9] Pattern classification with missing data: a review
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    NEURAL COMPUTING & APPLICATIONS, 2010, 19 (02) : 263 - 282
  • [10] Pattern classification with missing data: a review
    Pedro J. García-Laencina
    José-Luis Sancho-Gómez
    Aníbal R. Figueiras-Vidal
    Neural Computing and Applications, 2010, 19 : 263 - 282