Class-Balanced Loss Based on Effective Number of Samples

被引:1753
作者
Cui, Yin [1 ,2 ,5 ]
Jia, Menglin [1 ]
Lin, Tsung-Yi [3 ]
Song, Yang [4 ]
Belongie, Serge [1 ,2 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
[2] Cornell Tech, New York, NY 10044 USA
[3] Google Brain, Mountain View, CA USA
[4] Alphabet Inc, Mountain View, CA USA
[5] Google, Mountain View, CA 94043 USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of longtailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula (1-beta(n))/(1-beta), where n is the number of samples and beta is an element of[0, 1)is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
引用
收藏
页码:9260 / 9269
页数:10
相关论文
共 50 条
[1]  
[Anonymous], IEEE T NEURAL NETWOR
[2]  
[Anonymous], 2000, P 17 INT C MACH LEAR
[3]  
[Anonymous], 2017, CVPR
[4]  
[Anonymous], P 3 INT C LEARNING R
[5]  
[Anonymous], IEEE T KNOWLEDGE DAT
[6]  
[Anonymous], IJCAI
[7]  
[Anonymous], 2018, PAMI
[8]  
[Anonymous], 1986, IEEE ACCESS
[9]  
[Anonymous], 2016, CVPR
[10]  
[Anonymous], J OPERATIONS RES SOC