Long-tailed visual classification based on supervised contrastive learning with multi-view fusion

被引:1
作者
Zeng, Liang [1 ,2 ,3 ]
Feng, Zheng [1 ]
Chen, Jia [1 ]
Wang, Shanshan [1 ,2 ,3 ]
机构
[1] Hubei Univ Technol, Sch Elect & Elect Engn, Wuhan, Hubei, Peoples R China
[2] Hubei Univ Technol, Hubei Key Lab High efficiency Utilizat Solar Energ, Wuhan, Hubei, Peoples R China
[3] Hubei Univ Technol, Xiangyang Ind Inst, Xiangyang, Hubei, Peoples R China
关键词
Long-tailed classification; Deep learning; Feature fusion; Contrastive learning;
D O I
10.1016/j.knosys.2024.112301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vast majority of real-world data follows a long-tail distribution, wherein there is a large number of data points in the head category and a small number in the tail category. The efficacy of two-stage training surpasses that of end-to-end training for long-tail visual classification tasks. Nevertheless, in practical applications, the prevalence lies with the one-stage end-to-end model due to its ease of deployment. Recently, supervised contrastive learning has been employed to address the long-tail distribution with notable accomplishments. Both methodologies aim to mitigate the repulsive influence of the dominant class, while simultaneously striving for an equitable distribution of all classes across the hypersphere. We find that on the basis of the work of the former, giving a dynamically adjusted weighting factor to a class with the classification layer weight as the prior knowledge can increase the number of negative sample pairs for the tail class, thereby enhancing model attention and improving comparison accuracy. In order to further improve the tail class accuracy and the generalization ability of the model, this paper proposes a supervised contrastive learning network based on multi-view compensation feature fusion. The utilization of multi-view input in the network facilitates the incorporation of comprehensive representation information into the classification network, thereby augmenting the semantic understanding of samples in the contrastive learning network. Consequently, this leads to an enhancement in tail accuracy through the application of a dynamic weighted balanced loss function. In a small batch size, the proposed network achieves an average Top1 accuracy of 83.293% and 55.092% on Cifar10-LT and Cifar100-LT datasets respectively, with an imbalance factor of 0.01, thereby yielding significant results.
引用
收藏
页数:13
相关论文
共 40 条
[1]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[2]  
Byrd J, 2019, PR MACH LEARN RES, V97
[3]  
Cao KD, 2019, ADV NEUR IN, V32
[4]  
Chen T, 2020, PR MACH LEARN RES, V119
[5]   ResLT: Residual Learning for Long-Tailed Recognition [J].
Cui, Jiequan ;
Liu, Shu ;
Tian, Zhuotao ;
Zhong, Zhisheng ;
Jia, Jiaya .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :3695-3706
[6]   Class-Balanced Loss Based on Effective Number of Samples [J].
Cui, Yin ;
Jia, Menglin ;
Lin, Tsung-Yi ;
Song, Yang ;
Belongie, Serge .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269
[7]  
Gao J., 2024, Adv. Neural Inf. Process. Syst., V36
[8]  
Graf Florian, 2021, INT C MACHINE LEARNI, P3821
[9]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[10]   Disentangling Label Distribution for Long-tailed Visual Recognition [J].
Hong, Youngkyu ;
Han, Seungju ;
Choi, Kwanghee ;
Seo, Seokjun ;
Kim, Beomsu ;
Chang, Buru .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6622-6632