Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets

被引:28
作者
Sadhukhan, Payel [1 ]
Palit, Sarbani [2 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
[2] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India
关键词
Reverse nearest neighborhood; Multi-label classification; Multi-label learning; Class-imbalance; Oversampling; FEATURE-SELECTION; CLASSIFICATION;
D O I
10.1016/j.patrec.2019.08.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present a novel reverse-nearest neighborhood based oversampling scheme for the imbalanced labels of a multi-label dataset. Reverse nearest neighborhood of a query point includes all those points which contain the query point as one of their neighbor. It facilitates us to identify an adaptive number of neighbors (according to the density and distribution of points) instead of a fixed number of neighbors. We add label-specific synthetic minority instances in the reverse nearest neighborhood of the minority points of each label. Reverse nearest neighbor configuration also detects the singular minority points, which we avoid as seed points in the oversampling phase. On the oversampled data of each label, we train and invoke a Linear Support Vector Machine to complete the learning and testing. Results of the proposed method against comparing methods on class-imbalance focused metrics indicates its competence in handling differently imbalanced multi-label datasets. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:813 / 820
页数:8
相关论文
共 45 条
[1]  
[Anonymous], MACH LEARN
[2]  
[Anonymous], MULTILABEL CLASSIFIC
[3]  
[Anonymous], 2008, 2008 IEEE INT JOINT
[4]  
[Anonymous], MULTILABEL CLASSIFIC
[5]  
[Anonymous], SIGMOD C
[6]  
[Anonymous], J MACH LEARN RES
[7]  
[Anonymous], CORR
[8]   Hierarchical multi-label prediction of gene function [J].
Barutcuoglu, Z ;
Schapire, RE ;
Troyanskaya, OG .
BIOINFORMATICS, 2006, 22 (07) :830-836
[9]   RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates [J].
Bryant, Avory ;
Cios, Krzysztof .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) :1109-1121
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)