Missing data imputation with fuzzy feature selection for diabetes dataset

被引:0
作者
Mohamad Faiz Dzulkalnine
Roselina Sallehuddin
机构
[1] Universiti Teknologi Malaysia,Faculty of Computing
来源
SN Applied Sciences | 2019年 / 1卷
关键词
Missing data; Fuzzy feature selection; Imputation; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Missing data in datasets remain as a difficulty in terms of data analysis in various research fields, especially in the medical field, as it affects the treatment and diagnosis that the patient should receive. In this research, Fuzzy c-means (FCM) are used to impute the missing data. However, like in most data imputation methods, FCM do not consider the presence of irrelevant features. Irrelevant features can increase the computational time of the imputation process and decrease the accuracy of the prediction. Feature selection techniques can alleviate this problem by selecting the most relevant features and reducing the dataset size. Fuzzy principal component analysis (FPCA) is used as the feature selection method in this study as it considers the presence of outliers compared to classical PCA as outliers are the main reason some features renders irrelevant. Therefore, an improved hybrid imputation model of FPCA–Support vector machines–FCM (FPCA–SVM–FCM) has been proposed and employed in this study. The efficiency of the proposed model is investigated on one dataset which is Pima Indians Diabetes dataset. Experimental results showed that the proposed hybrid imputation model is better than the existing methods by producing a more accurate estimation in terms of accuracy, RMSE and MAE. The proposed method was also validated by using Wilcoxon rank sum and Theil’s U test and obtained good results compared to SVM–FCM. Therefore, it can be used as an alternative tool for handling missing data in order to obtain a better quality dataset.
引用
收藏
相关论文
共 50 条
  • [21] Missing Data Imputation Toolbox for MATLAB
    Folch-Fortuny, Abel
    Arteaga, Francisco
    Ferrer, Alberto
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 154 : 93 - 100
  • [22] Evaluating the Impact of Missing Data Imputation
    Pantanowitz, Adam
    Marwala, Tshildzi
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 577 - 586
  • [23] Multi-Objective Feature Selection With Missing Data in Classification
    Xue, Yu
    Tang, Yihang
    Xu, Xin
    Liang, Jiayu
    Neri, Ferrante
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (02): : 355 - 364
  • [24] Feature selection with missing data using mutual information estimators
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2012, 90 : 3 - 11
  • [25] Cooperative Clustering Missing Data Imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1039 - 1045
  • [26] OPTIMAL BAYESIAN FEATURE SELECTION WITH MISSING DATA
    Pour, Ali Foroughi
    Dalton, Lori A.
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 35 - 39
  • [27] A HYBRID SELF ORGANIZING MAP IMPUTATION (SOMI) WITH NAIVE BAYES FOR IMPUTATION MISSING DATA CLASSIFICATION
    Khotimah, Bain Khusnul
    Miswanto
    Suprajitno, Herry
    INTERNATIONAL JOURNAL OF GEOMATE, 2019, 17 (62): : 195 - 202
  • [28] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [29] Toward semantic data imputation for a dengue dataset
    Kamkhad, N.
    Jampachaisri, K.
    Siriyasatien, P.
    Kesorn, K.
    KNOWLEDGE-BASED SYSTEMS, 2020, 196
  • [30] Variable selection for additive models with missing data via multiple imputation
    Yuta Shimazu
    Takayuki Yamaguchi
    Ibuki A. J. Hoshina
    Hidetoshi Matsui
    Behaviormetrika, 2025, 52 (1) : 163 - 178