An evaluation of k-nearest neighbour imputation using Likert data

被引:116
|
作者
Jönsson, P [1 ]
Wohlin, C [1 ]
机构
[1] Blekinge Inst Technol, Sch Engn, SE-38235 Ronneby, Sweden
来源
10TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS | 2004年
关键词
D O I
10.1109/METRIC.2004.1357895
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in many different fields of research suffer from the problem of missing data. With missing data, statistical tests will lose power results may be biased, or analysis may not be feasible at all. There are several ways to handle the problem, for example through imputation. With imputation, missing values are replaced with estimated values according to an imputation method or model. In the k-Nearest Neighbour (k-NN) method, a case is imputed using values from the k most similar cases. In this paper we present an evaluation of the k-NN method using Likert data in a software engineering context. We simulate the method with different values of k and for different percentages of missing data. Our findings indicate that it is feasible to use the k-NN method with Likert data. We suggest that a suitable value of k is approximately the square root of the number of complete cases. We also show that by relaxing the method rules with respect to selecting neighbours, the ability of the method remains high for large amounts of missing data without affecting the quality of the imputation.
引用
收藏
页码:108 / 118
页数:11
相关论文
共 50 条
  • [1] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
    Jonsson, Per
    Wohlin, Claes
    EMPIRICAL SOFTWARE ENGINEERING, 2006, 11 (03) : 463 - 489
  • [2] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
    Per Jönsson
    Claes Wohlin
    Empirical Software Engineering, 2006, 11
  • [3] Balanced k-nearest neighbour imputation
    Hasler, Caren
    Tille, Yves
    STATISTICS, 2016, 50 (06) : 1310 - 1331
  • [4] Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets
    Ali, Najat
    Neagu, Daniel
    Trundle, Paul
    SN APPLIED SCIENCES, 2019, 1 (12):
  • [5] Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets
    Najat Ali
    Daniel Neagu
    Paul Trundle
    SN Applied Sciences, 2019, 1
  • [6] Modified K-Nearest Neighbour Using Proposed Similarity Fuzzy Measure for Missing Data Imputation on Medical Datasets (MKNNMBI)
    Bai B.M.
    Mangathayaru N.
    Rani P.B.
    International Journal of Fuzzy System Applications, 2022, 11 (03):
  • [7] Continuous k-Nearest Neighbour Strategies Using the mqrtree
    Osborn, Wendy
    ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 168 - 181
  • [8] Outlier detection using k-nearest neighbour graph
    Hautamäki, V
    Kärkkäinen, I
    Fränti, P
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 430 - 433
  • [9] K-nearest neighbour-based feature selection using hyperspectral data
    Pal, Mahesh
    Charan, Teja B.
    Poriya, Akshay
    REMOTE SENSING LETTERS, 2021, 12 (02) : 128 - 137
  • [10] k-Nearest Neighbour Classifiers - A Tutorial
    Cunningham, Padraig
    Delany, Sarah Jane
    ACM COMPUTING SURVEYS, 2021, 54 (06)