Zero and few shot action recognition in videos with caption semantic and generative assist

被引：0

作者：

Thrilokachandran G. ^{[1
]}

Hosalli Ramappa M. ^{[1
]}

机构：

[1] Department of Computer Science and Engineering, PES University, Bengaluru

来源：

International Journal of Information Technology | 2024年 / 16卷 / 5期

关键词：

Caption Generation; Few Shot Action Recognition; Image Generation; Multimodal learning; Zero Shot Action Recognition;

D O I：

10.1007/s41870-024-01808-y

中图分类号：

学科分类号：

摘要：

This research introduces a trimodal approach that integrates image caption, image generation and action semantic to achieve zero-shot and few-shot action recognition. The modules used to implement the modalities are Image Action Captioning, Generated Image Similarity Analyzer, and Action Semantics Analyzer. The Image Action Captioning model, uses CLIP (Contrastive Language-Image Pretraining) to generate captions. The Generated Image Similarity Analyzer employs the Stable diffusion model to generate action images, while the Action Semantics Analyzer combines action recognition and text embedding. Results of the three modules are integrated using soft voting. This approach allows for a more comprehensive understanding of different domains, thereby enhancing the accuracy of Zero-Shot and Few-Shot Action Recognition. Compared to the state-of-the-art model, the Caption Semantic Generative Assist enhances the accuracy of Zero-Shot Action Recognition by +4.97% for UCF101 dataset and +5.32% for HMDB51 dataset. Compared to the state-of-the-art model, the Caption Semantic Generative Assist enhances the accuracy of Few-Shot Action Recognition by +2.47% for UCF101 dataset and +11.18% for HMDB51 dataset. The source code of this work will be made available at https://github.com/GayathriThriloka/CSGA.git © The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.

引用

页码：3121 / 3133

页数：12

共 50 条

[1] A Generative Approach to Zero-Shot and Few-Shot Action Recognition
Mishra, Ashish
Verma, Vinay Kumar
Reddy, M. Shiva Krishna
Arulkumar, S.
Rai, Piyush
Mittal, Anurag
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 372 - 380
[2] Fairer Evaluation of Zero Shot Action Recognition in Videos
Huang, Kaiqiang
Delany, Sarah Jane
Mckeever, Susan
VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 206 - 215
[3] Zero-shot action recognition in videos: A survey
Estevam, Valter
Pedrini, Helio
Menotti, David
NEUROCOMPUTING, 2021, 439 : 159 - 175
[4] Commonsense Knowledge Prompting for Few-Shot Action Recognition in Videos
Shi, Yuheng
Wu, Xinxiao
Lin, Hanxi
Luo, Jiebo
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8395 - 8405
[5] SEMANTIC EMBEDDING SPACE FOR ZERO-SHOT ACTION RECOGNITION
Xu, Xun
Hospedales, Timothy
Gong, Shaogang
2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 63 - 67
[6] Global Semantic Descriptors for Zero-Shot Action Recognition
Estevam, Valter
Laroca, Rayson
Pedrini, Helio
Menotti, David
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1843 - 1847
[7] Cross-domain few-shot action recognition with unlabeled videos
Wang, Xiang
Zhang, Shiwei
Qing, Zhiwu
Lv, Yiliang
Gao, Changxin
Sang, Nong
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
[8] Alternative Semantic Representations for Zero-Shot Human Action Recognition
Wang, Qian
Chen, Ke
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 87 - 102
[9] SYNTACTICALLY GUIDED GENERATIVE EMBEDDINGS FOR ZERO-SHOT SKELETON ACTION RECOGNITION
Gupta, Pranay
Sharma, Divyanshu
Sarvadevabhatla, Ravi Kiran
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 439 - 443
[10] Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks
Huang, Kaiqiang
Miralles-Pechuan, Luis
Mckeever, Susan
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 254 - 264

← 1 2 3 4 5 →