Zero and few shot action recognition in videos with caption semantic and generative assist

被引:0
|
作者
Thrilokachandran G. [1 ]
Hosalli Ramappa M. [1 ]
机构
[1] Department of Computer Science and Engineering, PES University, Bengaluru
关键词
Caption Generation; Few Shot Action Recognition; Image Generation; Multimodal learning; Zero Shot Action Recognition;
D O I
10.1007/s41870-024-01808-y
中图分类号
学科分类号
摘要
This research introduces a trimodal approach that integrates image caption, image generation and action semantic to achieve zero-shot and few-shot action recognition. The modules used to implement the modalities are Image Action Captioning, Generated Image Similarity Analyzer, and Action Semantics Analyzer. The Image Action Captioning model, uses CLIP (Contrastive Language-Image Pretraining) to generate captions. The Generated Image Similarity Analyzer employs the Stable diffusion model to generate action images, while the Action Semantics Analyzer combines action recognition and text embedding. Results of the three modules are integrated using soft voting. This approach allows for a more comprehensive understanding of different domains, thereby enhancing the accuracy of Zero-Shot and Few-Shot Action Recognition. Compared to the state-of-the-art model, the Caption Semantic Generative Assist enhances the accuracy of Zero-Shot Action Recognition by +4.97% for UCF101 dataset and +5.32% for HMDB51 dataset. Compared to the state-of-the-art model, the Caption Semantic Generative Assist enhances the accuracy of Few-Shot Action Recognition by +2.47% for UCF101 dataset and +11.18% for HMDB51 dataset. The source code of this work will be made available at https://github.com/GayathriThriloka/CSGA.git © The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.
引用
收藏
页码:3121 / 3133
页数:12
相关论文
共 50 条
  • [1] A Generative Approach to Zero-Shot and Few-Shot Action Recognition
    Mishra, Ashish
    Verma, Vinay Kumar
    Reddy, M. Shiva Krishna
    Arulkumar, S.
    Rai, Piyush
    Mittal, Anurag
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 372 - 380
  • [2] Fairer Evaluation of Zero Shot Action Recognition in Videos
    Huang, Kaiqiang
    Delany, Sarah Jane
    Mckeever, Susan
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 206 - 215
  • [3] Zero-shot action recognition in videos: A survey
    Estevam, Valter
    Pedrini, Helio
    Menotti, David
    NEUROCOMPUTING, 2021, 439 : 159 - 175
  • [4] Commonsense Knowledge Prompting for Few-Shot Action Recognition in Videos
    Shi, Yuheng
    Wu, Xinxiao
    Lin, Hanxi
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8395 - 8405
  • [5] SEMANTIC EMBEDDING SPACE FOR ZERO-SHOT ACTION RECOGNITION
    Xu, Xun
    Hospedales, Timothy
    Gong, Shaogang
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 63 - 67
  • [6] Global Semantic Descriptors for Zero-Shot Action Recognition
    Estevam, Valter
    Laroca, Rayson
    Pedrini, Helio
    Menotti, David
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1843 - 1847
  • [7] Cross-domain few-shot action recognition with unlabeled videos
    Wang, Xiang
    Zhang, Shiwei
    Qing, Zhiwu
    Lv, Yiliang
    Gao, Changxin
    Sang, Nong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [8] Alternative Semantic Representations for Zero-Shot Human Action Recognition
    Wang, Qian
    Chen, Ke
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 87 - 102
  • [9] SYNTACTICALLY GUIDED GENERATIVE EMBEDDINGS FOR ZERO-SHOT SKELETON ACTION RECOGNITION
    Gupta, Pranay
    Sharma, Divyanshu
    Sarvadevabhatla, Ravi Kiran
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 439 - 443
  • [10] Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks
    Huang, Kaiqiang
    Miralles-Pechuan, Luis
    Mckeever, Susan
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 254 - 264