An evaluation of k-nearest neighbour imputation using Likert data

被引：116

作者：

Jönsson, P ^{[1
]}

Wohlin, C ^{[1
]}

机构：

[1] Blekinge Inst Technol, Sch Engn, SE-38235 Ronneby, Sweden

来源：

10TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS | 2004年

关键词：

D O I：

10.1109/METRIC.2004.1357895

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Studies in many different fields of research suffer from the problem of missing data. With missing data, statistical tests will lose power results may be biased, or analysis may not be feasible at all. There are several ways to handle the problem, for example through imputation. With imputation, missing values are replaced with estimated values according to an imputation method or model. In the k-Nearest Neighbour (k-NN) method, a case is imputed using values from the k most similar cases. In this paper we present an evaluation of the k-NN method using Likert data in a software engineering context. We simulate the method with different values of k and for different percentages of missing data. Our findings indicate that it is feasible to use the k-NN method with Likert data. We suggest that a suitable value of k is approximately the square root of the number of complete cases. We also show that by relaxing the method rules with respect to selecting neighbours, the ability of the method remains high for large amounts of missing data without affecting the quality of the imputation.

引用

页码：108 / 118

页数：11

共 50 条

[1] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
Jonsson, Per
Wohlin, Claes
EMPIRICAL SOFTWARE ENGINEERING, 2006, 11 (03) : 463 - 489
[2] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
Per Jönsson
Claes Wohlin
Empirical Software Engineering, 2006, 11
[3] Balanced k-nearest neighbour imputation
Hasler, Caren
Tille, Yves
STATISTICS, 2016, 50 (06) : 1310 - 1331
[4] Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets
Ali, Najat
Neagu, Daniel
Trundle, Paul
SN APPLIED SCIENCES, 2019, 1 (12):
[5] Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets
Najat Ali
Daniel Neagu
Paul Trundle
SN Applied Sciences, 2019, 1
[6] Modified K-Nearest Neighbour Using Proposed Similarity Fuzzy Measure for Missing Data Imputation on Medical Datasets (MKNNMBI)
Bai B.M.
Mangathayaru N.
Rani P.B.
International Journal of Fuzzy System Applications, 2022, 11 (03):
[7] Continuous k-Nearest Neighbour Strategies Using the mqrtree
Osborn, Wendy
ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 168 - 181
[8] Outlier detection using k-nearest neighbour graph
Hautamäki, V
Kärkkäinen, I
Fränti, P
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 430 - 433
[9] K-nearest neighbour-based feature selection using hyperspectral data
Pal, Mahesh
Charan, Teja B.
Poriya, Akshay
REMOTE SENSING LETTERS, 2021, 12 (02) : 128 - 137
[10] k-Nearest Neighbour Classifiers - A Tutorial
Cunningham, Padraig
Delany, Sarah Jane
ACM COMPUTING SURVEYS, 2021, 54 (06)

← 1 2 3 4 5 →