共 29 条
A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering
被引:1
作者:
Li, Junnan
[1
,2
,3
]
Zhao, Qing
[4
,5
]
Liu, Shuang
[2
]
机构:
[1] Univ Elect Sci & Technol China, Sch Cybersecur, Chengdu 611731, Peoples R China
[2] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
[3] Mashang Consumer Finance Co Ltd, Chongqing 401120, Peoples R China
[4] Chongqing Yubei Data Valley Primary Sch, Chongqing 401120, Peoples R China
[5] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Instance-based classifier;
Instance reduction;
Prototype selection;
Data preprocessing;
k nearest neighbor;
Classification;
NEIGHBOR;
ALGORITHM;
D O I:
10.1007/s11227-023-05885-x
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
The k nearest neighbor (KNN) classifier is one of the well-known instance-based classifiers. Nevertheless, the low efficiency in both running time and memory usage is a great challenge in the KNN classifier and its improvements due to noise and redundant samples. Although hybrid instance reduction approaches have been postulated as a good solution, they still suffer from the following issues: (a) adopted edition methods in existing hybrid instance reduction approaches are susceptible to harmful samples around the tested sample; (b) existing hybrid instance reduction approaches retain many internal samples, which contributes little to the classification accuracy and (or) leading to the low reduction rate; (c) existing hybrid instance reduction approaches rely on more than one parameter. The chief contributions of this article are that (a) a novel heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clustering (HIRRDKM) is proposed against the above issues; (b) a novel concept, i.e., the adaptive relative distance, is first proposed and calculated for each sample; (c) a novel edition method based on adaptive relative distance in HIRRDKM is second proposed to filter out harmful samples; (d) a novel condensing method based on adaptive relative distance and k-means clustering in HIRRDKM is third proposed to obtain condensed borderline samples from the training set without harmful samples. Experiments have proved that (a) HIRRDKM outperforms 6 state-of-the-art hybrid instance reduction methods on real data sets from various fields in weighing reduction rate and classification accuracy of KNN-based classifiers; (b) the running time of HIRRDKM is competitive.
引用
收藏
页码:13096 / 13123
页数:28
相关论文