Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition

被引:0
|
作者
Ni, Xinzhe [1 ]
Liu, Yong [1 ]
Wen, Hao [1 ]
Ji, Yatai [1 ]
Xiao, Jing [2 ]
Yang, Yujiu [1 ]
机构
[1] Tsinghua Univ, Shenzhen, Peoples R China
[2] Ping Insurance Grp Co China, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Few-shot action recognition; Prototype; Multimodal understanding;
D O I
10.1145/3652583.3658044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current methods for few-shot action recognition mainly fall into the metric learning framework following ProtoNet, which demonstrates the importance of prototypes. Although they achieve relatively good performance, the effect of multimodal information is ignored, e.g. label texts. In this work, we propose a novelMultimOdal PRototype-ENhanced Network (MORN), which uses the semantic information of label texts as multimodal information to enhance prototypes. A CLIP visual encoder and a frozen CLIP text encoder are introduced to obtain features with good multimodal initialization. Then in the visual flow, visual prototypes are computed by a visual prototype-computed module. In the text flow, a semantic-enhanced (SE) module and an inflating operation are used to obtain text prototypes. The final multimodal prototypes are then computed by a multimodal prototype-enhanced (MPE) module. Besides, we define a PRototype SImilarity DiffErence (PRIDE) to evaluate the quality of prototypes, which is used to verify our improvement on the prototype level and effectiveness of MORN. We conduct extensive experiments on four popular few-shot action recognition datasets: HMDB51, UCF101, Kinetics and SSv2, and MORN achieves state-of-the-art results. When plugging PRIDE into the training stage, the performance can be further improved.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] Multidimensional Prototype Refactor Enhanced Network for Few-Shot Action Recognition
    Liu, Shuwen
    Jiang, Min
    Kong, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6955 - 6966
  • [2] Compound Prototype Matching for Few-Shot Action Recognition
    Huang, Yifei
    Yang, Lijin
    Sato, Yoichi
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 351 - 368
  • [3] Reconstructed Prototype Network Combined with CDC-TAGCN for Few-Shot Action Recognition
    Wu, Aihua
    Ding, Songyu
    APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [4] Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition
    Wanyan, Yuyang
    Yang, Xiaoshan
    Chen, Chaofan
    Xu, Changsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6492 - 6502
  • [5] CLIP-guided Prototype Modulating for Few-shot Action Recognition
    Wang, Xiang
    Zhang, Shiwei
    Cen, Jun
    Gao, Changxin
    Zhang, Yingya
    Zhao, Deli
    Sang, Nong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 1899 - 1912
  • [6] Knowledge Graph enhanced Multimodal Learning for Few-shot Visual Recognition
    Han, Mengya
    Zhan, Yibing
    Yu, Baosheng
    Luo, Yong
    Du, Bo
    Tao, Dacheng
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [7] Multimodal Few-Shot Learning for Gait Recognition
    Moon, Jucheol
    Nhat Anh Le
    Minaya, Nelson Hebert
    Choi, Sang-Il
    APPLIED SCIENCES-BASEL, 2020, 10 (21): : 1 - 15
  • [8] Hybrid attentive prototypical network for few-shot action recognition
    Ruan, Zanxi
    Wei, Yingmei
    Guo, Yanming
    Xie, Yuxiang
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 8249 - 8272
  • [9] Intermediate prototype network for few-shot segmentation
    Luo, Xiaoliu
    Duan, Zhao
    Zhang, Taiping
    SIGNAL PROCESSING, 2023, 203
  • [10] Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
    Hatano, Masashi
    Hachiuma, Ryo
    Fujii, Ryo
    Saito, Hideo
    COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 182 - 199