VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引：15

作者：

Tian, Changyao ^{[1
,4
]}

Wang, Wenhai ^{[3
]}

Zhu, Xizhou ^{[2
]}

Dai, Jifeng ^{[2
]}

Qiao, Yu ^{[3
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] SenseTime, Hong Kong, Peoples R China

[3] Shanghai AI Lab, Shanghai, Peoples R China

[4] SenseTime Res, Hong Kong, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXV | 2022年 / 13685卷

关键词：

Long-tailed recognition; Vision-language models; SMOTE;

D O I：

10.1007/978-3-031-19806-9_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.

引用

页码：73 / 91

页数：19

共 58 条

[1] [Anonymous], 2013, ADV NEURAL INF PROCE
[2] A systematic study of the class imbalance problem in convolutional neural networks
Buda, Mateusz
Maki, Atsuto
Mazurowski, Maciej A.
[J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
[3] Cao KD, 2019, ADV NEUR IN, V32
[4] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[5] Chen YC, 2020, Arxiv, DOI [arXiv:1909.11740, DOI 10.48550/ARXIV.1909.11740]
[6] Cui J., 2021, P IEEE C COMPUTER VI
[7] Cui J., 2021, arXiv
[8] Class-Balanced Loss Based on Effective Number of Samples
Cui, Yin
Jia, Menglin
Lin, Tsung-Yi
Song, Yang
Belongie, Serge
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9260 - 9269
[9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10] Dosovitskiy A., 2021, P INT C LEARN REPR, P11929

← 1 2 3 4 5 6 →