A fuzzy K-nearest neighbor classifier to deal with imperfect data

被引:17
|
作者
Cadenas, Jose M. [1 ]
Carmen Garrido, M. [1 ]
Martinez, Raquel [2 ]
Munoz, Enrique [3 ]
Bonissone, Piero P. [4 ]
机构
[1] Univ Murcia, Dept Informat & Commun Engn, Murcia, Spain
[2] Catholic Univ Murcia, Dept Comp Engn, Murcia, Spain
[3] Univ Milan, Dept Comp Sci, Crema, Italy
[4] Piero P Bonissone Analyt LLC, San Diego, CA USA
关键词
k-nearest neighbors; Classification; Imperfect data; Distance/dissimilarity measures; Combination methods; PERFORMANCE; RULES; ALGORITHMS;
D O I
10.1007/s00500-017-2567-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-nearest neighbors method (kNN) is a nonparametric, instance-based method used for regression and classification. To classify a new instance, the kNN method computes its k nearest neighbors and generates a class value from them. Usually, this method requires that the information available in the datasets be precise and accurate, except for the existence of missing values. However, data imperfection is inevitable when dealing with real-world scenarios. In this paper, we present the kNN(imp) classifier, a k-nearest neighbors method to perform classification from datasets with imperfect value. The importance of each neighbor in the output decision is based on relative distance and its degree of imperfection. Furthermore, by using external parameters, the classifier enables us to define the maximum allowed imperfection, and to decide if the final output could be derived solely from the greatest weight class (the best class) or from the best class and a weighted combination of the closest classes to the best one. To test the proposed method, we performed several experiments with both synthetic and real-world datasets with imperfect data. The results, validated through statistical tests, show that the kNN(imp) classifier is robust when working with imperfect data and maintains a good performance when compared with other methods in the literature, applied to datasets with or without imperfection.
引用
收藏
页码:3313 / 3330
页数:18
相关论文
共 50 条
  • [21] A Training Data Set Cleaning Method by Classification Ability Ranking for the k-Nearest Neighbor Classifier
    Wang, Yidi
    Pan, Zhibin
    Pan, Yiwei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (05) : 1544 - 1556
  • [22] A new globally adaptive k-nearest neighbor classifier based on local mean optimization
    Pan, Zhibin
    Pan, Yiwei
    Wang, Yidi
    Wang, Wei
    SOFT COMPUTING, 2021, 25 (03) : 2417 - 2431
  • [23] Theoretical Analysis of Cross-Validation for Estimating the Risk of the k-Nearest Neighbor Classifier
    Celisse, Alain
    Mary-Huard, Tristan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [24] Boosted K-nearest neighbor classifiers based on fuzzy granules
    Li, Wei
    Chen, Yumin
    Song, Yuping
    KNOWLEDGE-BASED SYSTEMS, 2020, 195
  • [25] Dynamic data structures for k-nearest neighbor queries
    de Berg, Sarita
    Staals, Frank
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2023, 111
  • [26] EEkNN: k-Nearest Neighbor Classifier with an Evidential Editing Procedure for Training Samples
    Jiao, Lianmeng
    Geng, Xiaojiao
    Pan, Quan
    ELECTRONICS, 2019, 8 (05):
  • [27] Attention-based Local Mean K-Nearest Centroid Neighbor Classifier
    Ma, Ying
    Huang, Rui
    Yan, Ming
    Li, Guoqi
    Wang, Tian
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 201
  • [28] An Evidential K-Nearest Neighbor Classifier Based on Contextual Discounting and Likelihood Maximization
    Kanjanatarakul, Orakanya
    Kuson, Siwarat
    Denoeux, Thierry
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2018, 2018, 11069 : 155 - 162
  • [29] Recognition of driving postures by nonsubsampled contourlet transform and k-nearest neighbor classifier
    Zhao, Chihang
    He, Jie
    Zhang, Xiaoqin
    Qi, Xingzhi
    Chen, Aiwen
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2015, 30 (03): : 233 - 241
  • [30] FINkNN:: A fuzzy interval number k-nearest neighbor classifier for prediction of sugar production from populations of samples
    Petridis, V
    Kaburlasos, VG
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (01) : 17 - 37