Borderline-margin loss based deep metric learning framework for imbalanced data

被引:8
作者
Yan, Mi [1 ,2 ]
Li, Ning [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Minist Educ China, Dept Automat, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[2] Shanghai Engn Res Ctr Intelligent Control & Manag, Shanghai 200240, Peoples R China
基金
国家重点研发计划;
关键词
Imbalanced classification; Class imbalance; Class overlap; Deep metric framework; Borderline-margin loss; SMOTE; CLASSIFICATION; MACHINE; COST;
D O I
10.1007/s10489-022-03494-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The imbalanced data suffer the problem where minority class is under-represented compared with majority ones. Traditional imbalanced learning algorithms only consider the class imbalance while ignoring the class overlap, which leads to an undesirable accuracy for minority samples in overlapping regions. Considering the above issue, we propose a deep metric framework with borderline-margin loss (DMFBML) for improving the intra-class coherence and inter-class difference in overlapping regions. Firstly, a flexible borderline margin is designed for each minority sample, which is adaptively adjusted according to the neighborhood's label. The proposed margin enables to discriminate minority samples with varying overlap degrees, which significantly preserves the valuable information of classification boundary. The input data is then reconstructed into training triplets set to generate more metric constraints for minority samples, thereby increasing the difference in overlapping regions. Finally, a neural network with DMFBML is presented to achieve a better classifier performance on imbalanced data. The proposed method is verified by the comparative experiments on six synthetic datasets and eleven actual datasets.
引用
收藏
页码:1487 / 1504
页数:18
相关论文
共 67 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]  
Almutairi W, 2020, CATA, P141, DOI DOI 10.29007/H71Z
[4]   Autoencoder-based deep metric learning for network intrusion detection [J].
Andresini, Giuseppina ;
Appice, Annalisa ;
Malerba, Donato .
INFORMATION SCIENCES, 2021, 569 (569) :706-727
[5]  
Batista GEAPA., 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, 10.1145/1007730.1007735.2, DOI 10.1145/1007730.1007735]
[6]  
Bellet A., 2015, Synthesis Lectures on Artificial Intelligence and Machine Learning, V9, P1, DOI [10.1007/978-3-031-01572-4, DOI 10.2200/S00626ED1V01Y201501AIM030]
[7]  
Chao Chen.Andy Liaw. Leo Breiman., 2004, Using random forest to learn imbalanced data
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   The microbial coinfection in COVID-19 [J].
Chen, Xi ;
Liao, Binyou ;
Cheng, Lei ;
Peng, Xian ;
Xu, Xin ;
Li, Yuqing ;
Hu, Tao ;
Li, Jiyao ;
Zhou, Xuedong ;
Ren, Biao .
APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, 2020, 104 (18) :7777-7785
[10]   Boosting label weighted extreme learning machine for classifying multi -label imbalanced data [J].
Cheng, Ke ;
Gao, Shang ;
Dong, Wenlu ;
Yang, Xibei ;
Wang, Qi ;
Yu, Hualong .
NEUROCOMPUTING, 2020, 403 (403) :360-370