Long-tailed video recognition via majority-guided diffusion model

被引:0
作者
Hu, Yufan [1 ]
Zhang, Yi [1 ]
Zhang, Lixin [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Intelligence Sci & Technol, Beijing 100083, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金; 中国博士后科学基金;
关键词
Long-tailed distribution; Diffusion model; Video classification; Imbalanced learning;
D O I
10.1007/s00530-024-01624-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-tailed video recognition presents a significant challenge due to the imbalanced distribution of samples across different classes, where majority classes contain abundant samples, and minority classes are severely underrepresented. Existing methods primarily focus on resampling, reweighting, and architectural modifications, which do not fully leverage the rich information contained within majority class samples. Motivated by this, we propose a novel majority-guided diffusion model for addressing the long-tailed distribution problem in video recognition. Specifically, we introduce an attention-based feature mix module (AFM) to blend majority and minority class information, followed by a minority-class data generator (MDG) that synthesizes diverse minority class samples using a latent diffusion model. By leveraging the rich information from majority class samples, our method generates realistic minority class samples that improve the overall model performance on underrepresented categories. Extensive experimental results on long-tailed video recognition benchmarks validate the effectiveness of the proposed framework.
引用
收藏
页数:13
相关论文
共 54 条
[51]  
Yu H., 2024, Adv. Neural Inf. Process. Syst., V36
[52]   FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation [J].
Zang, Yuhang ;
Huang, Chen ;
Loy, Chen Change .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :3437-3446
[53]  
Zhang H., 2018, INT C LEARN REPR, P1
[54]   VideoLT: Large-scale Long-tailed Video Recognition [J].
Zhang, Xing ;
Wu, Zuxuan ;
Weng, Zejia ;
Fu, Huazhu ;
Chen, Jingjing ;
Jiang, Yu-Gang ;
Davis, Larry .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7940-7949