A hybrid method for missing value imputation

被引：2

作者：

Karanikola, Aikaterini ^{[1
]}

Kotsiantis, Sotiris ^{[1
]}

机构：

[1] Univ Patras, Dept Math, Rion, Greece

来源：

PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019) | 2019年

关键词：

Machine Learning; Data preprocessing; Missing values imputation; Imputation strategies; MULTIPLE IMPUTATION; REGRESSION;

D O I：

10.1145/3368640.3368653

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Missing values are a common incurrence in a great number of real-world datasets, emerging from diverse domains of interest. In research, missing data constitute a significant problem as it can affect the conclusions drawn from them. Considering this, the difficulty of data preprocessing is increasing as selecting an inappropriate way to handle missing information can lead to untrustworthy results. Unfortunately, like in most cases in Machine Learning, there is not a single solution that fits in every task related to the problem. For this reason, many strategies have been proposed to successfully deal with this issue. One of the most well-known, besides efficient, is imputation. Replacing a missing value with an estimation apparently eliminates the problem and provides complete datasets but the difficulty shifts in selecting the right method to impute missing values. A widely used imputation method that can be found in libraries of the most noted statistical and Machine Learning suites is IRMI. In this work, we propose a variant of IRMI in order to maintain the advantages of this famous imputation method, while outperforming its traditional variant used in many Machine Learning software tools. To achieve this, the benefits of boosting as well as decision tree theory are exploiting. To test the efficiency of our method, a series of experiments over 30 datasets was executed, measuring the classification accuracy of the proposed method to prove that outperforms its rivals, which include classic, as well as more sophisticated imputation strategies. Finally, the results of our study are provided, along with the conclusions that arise from them.

引用

页码：74 / 79

页数：6

共 30 条

[1]

Acuña E, 2004, ST CLASS DAT ANAL, P639

[2]

[Anonymous], 2015, International Journal of Computer Science and Mobile Computing (IJCSMC)

[3]

Batista GEAPA, 2003, APPL ARTIF INTELL, V17, P519, DOI 10.1080/08839510390219309

[4] Dealing with missing data in family-based association studies:: A multiple imputation approach [J].

Croiseau, Pascal ;

Genin, Emmanuelle ;

Cordell, Heather J. .

HUMAN HEREDITY, 2007, 63 (3-4) :229-238

[5]

Demsar J, 2006, J MACH LEARN RES, V7, P1

[6]

Dua D., 2017, Uci machine learning repository

[7] Multiple imputation as a flexible tool for missing data handling in clinical research [J].

Enders, Craig K. .

BEHAVIOUR RESEARCH AND THERAPY, 2017, 98 :4-18

[8] A decision-theoretic generalization of on-line learning and an application to boosting [J].

Freund, Y ;

Schapire, RE .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139

[9] Additive logistic regression: A statistical view of boosting - Rejoinder [J].

Friedman, J ;

Hastie, T ;

Tibshirani, R .

ANNALS OF STATISTICS, 2000, 28 (02) :400-407

[10]

Gajawada S., 2012, International Journal of Future Computer and Communication, P206, DOI 10.7763/IJFCC.2012.V1.54

← 1 2 3 →