VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

被引:15
作者
Tian, Changyao [1 ,4 ]
Wang, Wenhai [3 ]
Zhu, Xizhou [2 ]
Dai, Jifeng [2 ]
Qiao, Yu [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] SenseTime, Hong Kong, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
[4] SenseTime Res, Hong Kong, Peoples R China
来源
COMPUTER VISION, ECCV 2022, PT XXV | 2022年 / 13685卷
关键词
Long-tailed recognition; Vision-language models; SMOTE;
D O I
10.1007/978-3-031-19806-9_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain to be proved. In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 58 条
  • [1] [Anonymous], 2013, ADV NEURAL INF PROCE
  • [2] A systematic study of the class imbalance problem in convolutional neural networks
    Buda, Mateusz
    Maki, Atsuto
    Mazurowski, Maciej A.
    [J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
  • [3] Cao KD, 2019, ADV NEUR IN, V32
  • [4] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [5] Chen YC, 2020, Arxiv, DOI [arXiv:1909.11740, DOI 10.48550/ARXIV.1909.11740]
  • [6] Cui J., 2021, P IEEE C COMPUTER VI
  • [7] Cui J., 2021, arXiv
  • [8] Class-Balanced Loss Based on Effective Number of Samples
    Cui, Yin
    Jia, Menglin
    Lin, Tsung-Yi
    Song, Yang
    Belongie, Serge
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9260 - 9269
  • [9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [10] Dosovitskiy A., 2021, P INT C LEARN REPR, P11929