Multidimensional Prototype Refactor Enhanced Network for Few-Shot Action Recognition

被引:21
作者
Liu, Shuwen [1 ]
Jiang, Min [1 ]
Kong, Jun [2 ]
机构
[1] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Jiangsu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Prototypes; Training; Feature extraction; Optimization; Image recognition; Face recognition; Visualization; Few-shot action recognition; prototype enhancement; similarity optimization; temporal modeling;
D O I
10.1109/TCSVT.2022.3175923
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Few-shot action recognition classifies new actions with only few training samples, of which the mainstream methods adopt class means to obtain prototypes as the representations of each category. However, affected by sample capacity and extreme samples, mean-of-class prototypes can't well represent the average level of samples. In this paper, we enhance the prototypes from multiple dimensions for better classification. We firstly propose a novel similarity optimization mechanism where Prototype Aggregation Adaptive Loss (PAAL) is designed to deeply mine the similarity between samples and prototypes for enhancing the ability of inter-class differential detail identification. Secondly, for mitigating the impact of the samples on class prototypes, we refactor the prototype calculation formula with Cross-Enhanced Prototype (CEP) to narrow intra-class differences in which Reweighted Similarity Attention (RSA) is designed to update prototypes. Finally, Dynamic Temporal Transformation (DTT) is proposed to alleviate inconsistent distribution of temporal information for obtaining better video-level descriptors. Extensive experiments on standard benchmark datasets demonstrate that our proposed method achieves the state-of-the-art results.
引用
收藏
页码:6955 / 6966
页数:12
相关论文
共 59 条
[11]   Rethinking Triplet Loss for Domain Adaptation [J].
Deng, Weijian ;
Zheng, Liang ;
Sun, Yifan ;
Jiao, Jianbin .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) :29-37
[12]   Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector [J].
Fan, Qi ;
Zhuo, Wei ;
Tang, Chi-Keung ;
Tai, Yu-Wing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4012-4021
[13]  
Finn C, 2017, PR MACH LEARN RES, V70
[14]   You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images [J].
Gan, Chuang ;
Yao, Ting ;
Yang, Kuiyuan ;
Yang, Yi ;
Mei, Tao .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :923-932
[15]   Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames [J].
Gan, Chuang ;
Sun, Chen ;
Duan, Lixin ;
Gong, Boqing .
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :849-866
[16]   Recognizing an Action Using Its Name: A Knowledge-Based Approach [J].
Gan, Chuang ;
Yang, Yi ;
Zhu, Linchao ;
Zhao, Deli ;
Zhuang, Yueting .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 120 (01) :61-77
[17]  
Gan C, 2015, PROC CVPR IEEE, P2568, DOI 10.1109/CVPR.2015.7298872
[18]   The "something something" video database for learning and evaluating visual common sense [J].
Goyal, Raghav ;
Kahou, Samira Ebrahimi ;
Michalski, Vincent ;
Materzynska, Joanna ;
Westphal, Susanne ;
Kim, Heuna ;
Haenel, Valentin ;
Fruend, Ingo ;
Yianilos, Peter ;
Mueller-Freitag, Moritz ;
Hoppe, Florian ;
Thurau, Christian ;
Bax, Ingo ;
Memisevic, Roland .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851
[19]  
Hadsell R., 2006, P 2006 IEEE COMP SOC, V2, P1735
[20]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778