Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition

被引：0

作者：

Ni, Xinzhe ^{[1
]}

Liu, Yong ^{[1
]}

Wen, Hao ^{[1
]}

Ji, Yatai ^{[1
]}

Xiao, Jing ^{[2
]}

Yang, Yujiu ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen, Peoples R China

[2] Ping Insurance Grp Co China, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Few-shot action recognition; Prototype; Multimodal understanding;

D O I：

10.1145/3652583.3658044

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current methods for few-shot action recognition mainly fall into the metric learning framework following ProtoNet, which demonstrates the importance of prototypes. Although they achieve relatively good performance, the effect of multimodal information is ignored, e.g. label texts. In this work, we propose a novelMultimOdal PRototype-ENhanced Network (MORN), which uses the semantic information of label texts as multimodal information to enhance prototypes. A CLIP visual encoder and a frozen CLIP text encoder are introduced to obtain features with good multimodal initialization. Then in the visual flow, visual prototypes are computed by a visual prototype-computed module. In the text flow, a semantic-enhanced (SE) module and an inflating operation are used to obtain text prototypes. The final multimodal prototypes are then computed by a multimodal prototype-enhanced (MPE) module. Besides, we define a PRototype SImilarity DiffErence (PRIDE) to evaluate the quality of prototypes, which is used to verify our improvement on the prototype level and effectiveness of MORN. We conduct extensive experiments on four popular few-shot action recognition datasets: HMDB51, UCF101, Kinetics and SSv2, and MORN achieves state-of-the-art results. When plugging PRIDE into the training stage, the performance can be further improved.

引用

页码：1 / 10

页数：10

共 50 条

[1] Multidimensional Prototype Refactor Enhanced Network for Few-Shot Action Recognition
Liu, Shuwen
Jiang, Min
Kong, Jun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6955 - 6966
[2] Compound Prototype Matching for Few-Shot Action Recognition
Huang, Yifei
Yang, Lijin
Sato, Yoichi
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 351 - 368
[3] Reconstructed Prototype Network Combined with CDC-TAGCN for Few-Shot Action Recognition
Wu, Aihua
Ding, Songyu
APPLIED SCIENCES-BASEL, 2023, 13 (20):
[4] Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition
Wanyan, Yuyang
Yang, Xiaoshan
Chen, Chaofan
Xu, Changsheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6492 - 6502
[5] CLIP-guided Prototype Modulating for Few-shot Action Recognition
Wang, Xiang
Zhang, Shiwei
Cen, Jun
Gao, Changxin
Zhang, Yingya
Zhao, Deli
Sang, Nong
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 1899 - 1912
[6] Knowledge Graph enhanced Multimodal Learning for Few-shot Visual Recognition
Han, Mengya
Zhan, Yibing
Yu, Baosheng
Luo, Yong
Du, Bo
Tao, Dacheng
2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
[7] Multimodal Few-Shot Learning for Gait Recognition
Moon, Jucheol
Nhat Anh Le
Minaya, Nelson Hebert
Choi, Sang-Il
APPLIED SCIENCES-BASEL, 2020, 10 (21): : 1 - 15
[8] Hybrid attentive prototypical network for few-shot action recognition
Ruan, Zanxi
Wei, Yingmei
Guo, Yanming
Xie, Yuxiang
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 8249 - 8272
[9] Intermediate prototype network for few-shot segmentation
Luo, Xiaoliu
Duan, Zhao
Zhang, Taiping
SIGNAL PROCESSING, 2023, 203
[10] Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Hatano, Masashi
Hachiuma, Ryo
Fujii, Ryo
Saito, Hideo
COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 182 - 199

← 1 2 3 4 5 →