A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering

被引:1
作者
Li, Junnan [1 ,2 ,3 ]
Zhao, Qing [4 ,5 ]
Liu, Shuang [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Cybersecur, Chengdu 611731, Peoples R China
[2] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
[3] Mashang Consumer Finance Co Ltd, Chongqing 401120, Peoples R China
[4] Chongqing Yubei Data Valley Primary Sch, Chongqing 401120, Peoples R China
[5] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
基金
中国国家自然科学基金;
关键词
Instance-based classifier; Instance reduction; Prototype selection; Data preprocessing; k nearest neighbor; Classification; NEIGHBOR; ALGORITHM;
D O I
10.1007/s11227-023-05885-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The k nearest neighbor (KNN) classifier is one of the well-known instance-based classifiers. Nevertheless, the low efficiency in both running time and memory usage is a great challenge in the KNN classifier and its improvements due to noise and redundant samples. Although hybrid instance reduction approaches have been postulated as a good solution, they still suffer from the following issues: (a) adopted edition methods in existing hybrid instance reduction approaches are susceptible to harmful samples around the tested sample; (b) existing hybrid instance reduction approaches retain many internal samples, which contributes little to the classification accuracy and (or) leading to the low reduction rate; (c) existing hybrid instance reduction approaches rely on more than one parameter. The chief contributions of this article are that (a) a novel heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering (HIRRDKM) is proposed against the above issues; (b) a novel concept, i.e., the adaptive relative distance, is first proposed and calculated for each sample; (c) a novel edition method based on adaptive relative distance in HIRRDKM is second proposed to filter out harmful samples; (d) a novel condensing method based on adaptive relative distance and k-means clustering in HIRRDKM is third proposed to obtain condensed borderline samples from the training set without harmful samples. Experiments have proved that (a) HIRRDKM outperforms 6 state-of-the-art hybrid instance reduction methods on real data sets from various fields in weighing reduction rate and classification accuracy of KNN-based classifiers; (b) the running time of HIRRDKM is competitive.
引用
收藏
页码:13096 / 13123
页数:28
相关论文
共 29 条
[1]   Fast geometrical extraction of nearest neighbors from multi-dimensional data [J].
Aziz, Yasir ;
Memon, Kashif Hussain .
PATTERN RECOGNITION, 2023, 136
[2]   ATISA: Adaptive Threshold-based Instance Selection Algorithm [J].
Cavalcanti, George D. C. ;
Ren, Tsang Ing ;
Pereira, Cesar Lima .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (17) :6894-6900
[3]   CONDENSED NEAREST NEIGHBOR RULE [J].
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (03) :515-+
[4]   Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space [J].
Heo, Jae-Pil ;
Lin, Zhe ;
Yoon, Sung-Eui .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (09) :2084-2097
[5]   InstanceRank based on borders for instance selection [J].
Hernandez-Leal, Pablo ;
Ariel Carrasco-Ochoa, J. ;
Fco Martinez-Trinidad, J. ;
Arturo Olvera-Lopez, J. .
PATTERN RECOGNITION, 2013, 46 (01) :365-375
[6]   Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters [J].
Khan, Imran ;
Luo, Zongwei ;
Huang, Joshua Zhexue ;
Shahzad, Waseem .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) :1838-1853
[7]   A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean [J].
Kumbure, Mahinda Mailagaha ;
Luukka, Pasi ;
Collan, Mikael .
PATTERN RECOGNITION LETTERS, 2020, 140 :172-178
[8]   Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective [J].
Leyva, Enrique ;
Gonzalez, Antonio ;
Perez, Raul .
PATTERN RECOGNITION, 2015, 48 (04) :1523-1537
[9]   A new fast reduction technique based on binary nearest neighbor tree [J].
Li, Juan ;
Wang, Yuping .
NEUROCOMPUTING, 2015, 149 :1647-1657
[10]   A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors [J].
Li, Junnan ;
Zhu, Qingsheng ;
Wu, Quanwang .
APPLIED INTELLIGENCE, 2020, 50 (05) :1527-1541