Long-tailed video recognition via majority-guided diffusion model

被引:0
作者
Hu, Yufan [1 ]
Zhang, Yi [1 ]
Zhang, Lixin [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Intelligence Sci & Technol, Beijing 100083, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金; 中国博士后科学基金;
关键词
Long-tailed distribution; Diffusion model; Video classification; Imbalanced learning;
D O I
10.1007/s00530-024-01624-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-tailed video recognition presents a significant challenge due to the imbalanced distribution of samples across different classes, where majority classes contain abundant samples, and minority classes are severely underrepresented. Existing methods primarily focus on resampling, reweighting, and architectural modifications, which do not fully leverage the rich information contained within majority class samples. Motivated by this, we propose a novel majority-guided diffusion model for addressing the long-tailed distribution problem in video recognition. Specifically, we introduce an attention-based feature mix module (AFM) to blend majority and minority class information, followed by a minority-class data generator (MDG) that synthesizes diverse minority class samples using a latent diffusion model. By leveraging the rich information from majority class samples, our method generates realistic minority class samples that improve the overall model performance on underrepresented categories. Extensive experimental results on long-tailed video recognition benchmarks validate the effectiveness of the proposed framework.
引用
收藏
页数:13
相关论文
共 54 条
[1]   Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models [J].
Blattmann, Andreas ;
Rombach, Robin ;
Ling, Huan ;
Dockhorn, Tim ;
Kim, Seung Wook ;
Fidler, Sanja ;
Kreis, Karsten .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22563-22575
[2]   ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot [J].
Cai, Jiarui ;
Wang, Yizhou ;
Hwang, Jenq-Neng .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :112-121
[3]  
Cao KD, 2019, ADV NEUR IN, V32
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   SpectralDiff: A Generative Framework for Hyperspectral Image Classification With Diffusion Models [J].
Chen, Ning ;
Yue, Jun ;
Fang, Leyuan ;
Xia, Shaobo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[6]   Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection [J].
Chen, Tao ;
Guo, Yanrong ;
Hao, Shijie ;
Hong, Richang .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :3567-3579
[7]   Class-Balanced Loss Based on Effective Number of Samples [J].
Cui, Yin ;
Jia, Menglin ;
Lin, Tsung-Yi ;
Song, Yang ;
Belongie, Serge .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269
[8]  
Drummond C., 2003, WORKSH LEARN IMB DAT, V11, P1
[9]  
Han Pengxiao, 2024, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P2639, DOI 10.1109/CVPRW63382.2024.00270
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778