Normalized class coherence change-based k NN for classification of imbalanced data

被引:23
作者
Kim, Kyoungok [1 ]
机构
[1] Seoul Natl Univ Sci & Technol SeoulTech, Int Fus Sch, Informat Technol Management Programme, 232 Gongreungno, Seoul 01811, South Korea
基金
新加坡国家研究基金会;
关键词
Nearest neighbor classification; Imbalanced data; Class coherence; kNN; NEAREST-NEIGHBOR RULE; ALGORITHMS; SELECTION;
D O I
10.1016/j.patcog.2021.108126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
kNN is a widely used machine learning algorithm in many different domains because of its fairly good performance in actual cases and its simplicity. This study aims to enhance the performance of kNN for imbalanced datasets, a topic that has been relatively ignored in kNN research. The proposed kNN algorithm, called normalized class coherence change-based k-nearest neighbor (NCC-NN) algorithm, determines the label of a test sample by computing the normalized class coherence changes at class and sample levels for every possible class and assigning the sample to the class with the maximum value. It considers the tendency that the minority classes usually show the lower-class coherence than the majority class. NCC-kNN also utilizes the adaptive k for the class coherence, which is calculated in a weighted manner to reduce the sensitivity to the selection of k . NCC-kNN was applied to 20 benchmark datasets with varying class imbalance and coherence, and its performance was compared with that of five kNN algorithms, SMOTE and MetaCost with standard kNN as a base classifier. The proposed NCC-kNN outperformed the other kNN algorithms in classification of imbalanced data, especially for imbalanced data with low positive class coherence. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 35 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]  
[Anonymous], 2010, P 2010 INT JOINT C N, DOI DOI 10.1109/IJCNN.2010.5596486
[3]  
[Anonymous], 2012, Advances in Neural Information Processing Systems 25
[4]  
Batista G, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI DOI 10.1145/1007730.1007735
[5]   Locally adaptive k parameter selection for nearest neighbor classifier: one nearest cluster [J].
Bulut, Faruk ;
Amasyali, Mehmet Fatih .
PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (02) :415-425
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   Locally adaptive metric nearest-neighbor classification [J].
Domeniconi, C ;
Peng, J ;
Gunopulos, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (09) :1281-1285
[8]  
Domingos P., 1999, P ACM SIGKDD INT C K, DOI DOI 10.1145/312129.312220
[9]  
Dua D., 2017, UCI machine learning repository
[10]  
Elkan C., 2001, INT JOINT C ART INT, P973