On the suitability of Prototype Selection methods for kNN classification with distributed data

被引:13
作者
Valero-Mas, Jose J. [1 ]
Calvo-Zaragoza, Jorge [1 ]
Rico-Juan, Juan R. [1 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Carretera San Vicente Raspeig S-N, Alicante 03690, Spain
关键词
Prototype Selection; Distributed data; k-Nearest Neighbour; Experimental study; NEAREST-NEIGHBOR RULE; REDUCTION TECHNIQUES; LEARNING ALGORITHMS; DATA SETS; BIG DATA; INSTANCE;
D O I
10.1016/j.neucom.2016.04.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:150 / 160
页数:11
相关论文
共 40 条
[1]   Fast nearest neighbor condensation for large data sets classification [J].
Angiulli, Fabrizio .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (11) :1450-1464
[2]  
[Anonymous], 2010, MEMET COMPUT, DOI DOI 10.1007/S12293-010-0048-1
[3]  
Asuncion A., 2007, Uci machine learning repository
[4]  
Brighton H, 1999, LECT NOTES ARTIF INT, V1704, P283
[5]   Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? [J].
Buhrmester, Michael ;
Kwang, Tracy ;
Gosling, Samuel D. .
PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2011, 6 (01) :3-5
[6]   Recognition of Pen-Based Music Notation: the HOMUS dataset [J].
Calvo-Zaragoza, Jorge ;
Oncina, Jose .
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :3038-3043
[7]   On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining [J].
Cano, JR ;
Herrera, F ;
Lozano, M .
APPLIED SOFT COMPUTING, 2006, 6 (03) :323-332
[8]   Stratification for scaling up evolutionary prototype selection [J].
Cano, JR ;
Herrera, F ;
Lozano, M .
PATTERN RECOGNITION LETTERS, 2005, 26 (07) :953-963
[9]   Nearest neighbour editing and condensing tools-synergy exploitation [J].
Dasarathy, BV ;
Sánchez, JS ;
Townsend, S .
PATTERN ANALYSIS AND APPLICATIONS, 2000, 3 (01) :19-30
[10]   A divide-and-conquer recursive approach for scaling up instance selection algorithms [J].
de Haro-Garcia, Aida ;
Garcia-Pedrajas, Nicolas .
DATA MINING AND KNOWLEDGE DISCOVERY, 2009, 18 (03) :392-418