LM-CLIP: Adapting Positive Asymmetric Loss for Long-Tailed Multi-Label Classification

被引:1
作者
Timmermann, Christoph [1 ]
Jung, Seunghyeon [1 ]
Kim, Miso [1 ]
Lee, Woojin [1 ]
机构
[1] Dongguk Univ, Grad Sch Comp Sci & Artificial Intelligence, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
Heavily-tailed distribution; Head; Tail; Multi label classification; Training; Visualization; Adaptation models; Focusing; Tuning; Optimization; Long-tailed learning; multi-label classification; CLIP; vision-language models; contrastive learning; class imbalance; loss functions; asymmetric loss; balanced asymmetric loss; imbalanced sampling;
D O I
10.1109/ACCESS.2025.3561581
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurate multi-label image classification is essential for real-world applications, especially in scenarios with long-tailed class distributions, where some classes appear frequently while others are rare. This imbalance often leads to biased models that struggle to accurately recognize underrepresented classes. Existing methods either trade off performance between head and tail classes or rely on image captions, limiting adaptability. To address these limitations, we propose LM-CLIP, a novel framework built around a unified loss function. Our Balanced Asymmetric Loss (BAL) extends traditional asymmetric loss by emphasizing the gradients of rare positive samples where the model is uncertain, mitigating bias toward dominant classes. This is complemented by a contrastive loss that pushes negative samples further from the decision boundary, creating a more optimal embedding space even in long-tailed scenarios. These loss functions together ensure balanced performance across all classes. Our framework is built on pre-trained models utilizing textual and visual features from millions of image-text pairs. Furthermore, we incorporate a dynamic sampling strategy that prioritizes rare classes based on their occurrence, which ensures effective training without compromising overall performance. Experiments conducted on VOC-MLT and COCO-MLT benchmarks demonstrate the effectiveness of our approach, achieving +4.66% and +8.14% improvements in mean Average Precision (mAP) over state-of-the-art methods. Our code is publicly available at https://github.com/damilab/lm-clip.
引用
收藏
页码:71053 / 71065
页数:13
相关论文
共 37 条
[1]   CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification [J].
Abdelfattah, Rabab ;
Guo, Qing ;
Li, Xiaoguang ;
Wang, Xiaofeng ;
Wang, Song .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :1348-1357
[2]  
Alabdulmohsin I, 2024, Arxiv, DOI arXiv:2403.04547
[3]   CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP Aligned Representations [J].
Ali, Muhammad ;
Khan, Salman .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, :4677-4681
[4]  
[Anonymous], 2020, International Journal of Advanced Trends in Computer Science and Engineering, V9, DOI [DOI 10.30534/IJATCSE/2020/175942020, 10.30534/ijatcse/2020/175942020]
[5]  
Ding CH, 2024, Arxiv, DOI arXiv:2410.10247
[6]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[7]  
Gao P, 2025, Arxiv, DOI arXiv:2110.04544
[8]  
Gao Peng, 2021, arXiv
[9]   Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-balanced Samplings [J].
Guo, Hao ;
Wang, Song .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15084-15093
[10]   Texts as Images in Prompt Tuning for Multi-Label Image Recognition [J].
Guo, Zixian ;
Dong, Bowen ;
Ji, Zhilong ;
Bai, Jinfeng ;
Guo, Yiwen ;
Zuo, Wangmeng .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :2808-2817