Techniques to Deal with Missing Data

被引:0
作者
Sessa, Jadran [1 ]
Syed, Dabeeruddin [1 ]
机构
[1] Masdar Inst Sci & Technol, Dept Elect Engn & Comp Sci, Abu Dhabi, U Arab Emirates
来源
2016 5TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA) | 2016年
关键词
Data mining; Missing data; Missing values; Probabilistic approach; k-NN imputation; Mean and Median imputation; IMPUTATION; VALUES; DATABASES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data is available to us in humongous amounts in the real world, but none of it is of practical use if not converted to useful information. However, the knowledge discovery is hindered because the real data is often incomplete and noisy. Nowadays, the problem of recovering missing data has found most important place in the field of data mining. Filling the missing data is a significant task, as it is paramount to use all available data for the given datasets are generally very small. In this paper, we deal with the real data with many missing values. Furthermore, we deal with the given data in three phases. The first phase considers the concept of feature selection, while the second phase iteratively considers filling in the missing values using probabilistic approach, keeping in mind the fact that features can be either nominal or numerical. Finally, the third phase deals with correcting the missing values that have been filled in. In our work, we have compared two imputation methods for dealing with the missing data, namely k-NN imputation method and mean and median imputation method. As a result, we have found that both of the imputation methods are efficient and yield more or less the same accuracy.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Statistical primer: how to deal with missing data in scientific research?
    Papageorgiou, Grigorios
    Grant, Stuart W.
    Takkenberg, Johanna J. M.
    Mokhles, Mostafa M.
    INTERACTIVE CARDIOVASCULAR AND THORACIC SURGERY, 2018, 27 (02) : 153 - 158
  • [2] How to deal with missing categorical data:: Test of a simple Bayesian method
    Chen, GY
    Åstebro, T
    ORGANIZATIONAL RESEARCH METHODS, 2003, 6 (03) : 309 - 327
  • [3] Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study
    Campos, Sergio
    Pizarro, Luis
    Valle, Carlos
    Gray, Katherine R.
    Rueckert, Daniel
    Allende, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 3 - 10
  • [4] Attrition in longitudinal studies: How to deal with missing data
    Twisk, J
    de Vente, W
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2002, 55 (04) : 329 - 337
  • [5] Missing data techniques in classification for cardiovascular dysautonomias diagnosis
    Ali Idri
    Ilham Kadi
    Ibtissam Abnane
    José Luis Fernandez-Aleman
    Medical & Biological Engineering & Computing, 2020, 58 : 2863 - 2878
  • [6] Missing data techniques in classification for cardiovascular dysautonomias diagnosis
    Idri, Ali
    Kadi, Ilham
    Abnane, Ibtissam
    Fernandez-Aleman, Jose Luis
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (11) : 2863 - 2878
  • [7] Multiple Imputation to Deal with Missing Clinical Data in Rheumatologic Surveys: an Application in the WHO-ILAR COPCORD Study in Iran
    Mirmohammadkhani, M.
    Foroushani, A. Rahimi
    Davatchi, F.
    Mohammad, K.
    Jamshidi, A.
    Banihashemi, A. Tehrani
    Naieni, K. Holakouie
    IRANIAN JOURNAL OF PUBLIC HEALTH, 2012, 41 (01) : 87 - 95
  • [8] Missing Data Problem in the Monitoring System: A Review
    Du, Jinghan
    Hu, Minghua
    Zhang, Weining
    IEEE SENSORS JOURNAL, 2020, 20 (23) : 13984 - 13998
  • [9] Handling Missing Data Problems with Sampling Methods
    Houari, Rima
    Bounceur, Ahcene
    Tari, A-Kamel
    Kechadi, M-Tahar
    2014 INTERNATIONAL CONFERENCE ON ADVANCED NETWORKING DISTRIBUTED SYSTEMS AND APPLICATIONS (INDS 2014), 2014, : 99 - 104
  • [10] Missing Data Techniques for Factor Analysis
    Wang, Hong-Long
    Yang, Meng-Li
    Chen, Chun-Ju
    Lin, Ting-Hsiang
    JOURNAL OF RESEARCH IN EDUCATION SCIENCES, 2012, 57 (01): : 29 - 50