Matching Video Net: Memory-based embedding for video action recognition

被引:0
|
作者
Kim, Daesik [1 ]
Lee, Myunggi [1 ]
Kwak, Nojun [1 ]
机构
[1] Seoul Natl Univ, Grad Sch Convergence Sci & Technol, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of recent successful researches on action recognition are based on deep learning structures. Nonetheless, training deep neural networks is notorious for requiring huge amount of data. On the other hand, not enough data can lead to an overfitted model. In this work, we propose a novel model, matching video net (MVN), which can be trained with a small amount of data. In order to avoid the problem of overfitting, we use a non-parametric setup on top of parametric networks with external memories. An input clip of video is transformed into an embedding space and matched to the memorized samples in the embedding space. Then, the similarities between the input and the memorized data are measured to determine the nearest neighbors. We perform experiments in a supervised manner on action recognition datasets, achieving state-of-the-art results. Moreover, we applied our model to one-shot learning problems with a novel training strategy. Our model achieves surprisingly good results in predicting unseen action classes from only a few examples.
引用
收藏
页码:432 / 438
页数:7
相关论文
共 50 条
  • [1] Memory-Based Neighbourhood Embedding for Visual Recognition
    Li, Suichan
    Chen, Dapeng
    Liu, Bin
    Yu, Nenghai
    Zhao, Rui
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6101 - 6110
  • [2] Memory-Based Augmentation Network for Video Captioning
    Jing, Shuaiqi
    Zhang, Haonan
    Zeng, Pengpeng
    Gao, Lianli
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2367 - 2379
  • [3] Action Recognition in Video by Covariance Matching of Silhouette Tunnels
    Guo, Kai
    Ishwar, Prakash
    Konrad, Janusz
    2009 XXII BRAZILIAN SYMPOSIUM ON COMPUTER GRAPHICS AND IMAGE PROCESSING (SIBGRAPI 2009), 2009, : 299 - 306
  • [4] Memory-based moving object extraction for video indexing
    Wang, RRY
    Hong, PY
    Huang, T
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS: COMPUTER VISION AND IMAGE ANALYSIS, 2000, : 811 - 814
  • [5] Multi-Scale Memory-Based Video Deblurring
    Ji, Bo
    Yao, Angela
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1918 - 1927
  • [6] Exploiting recollection effects for memory-based video object segmentation
    Cho E.
    Kim M.
    Kim H.-I.
    Moon J.
    Kim S.T.
    Image and Vision Computing, 2023, 140
  • [7] Multi modal human action recognition for video content matching
    Jun Guo
    Hao Bai
    Zhanyong Tang
    Pengfei Xu
    Daguang Gan
    Baoying Liu
    Multimedia Tools and Applications, 2020, 79 : 34665 - 34683
  • [8] Human pose recognition by memory-based hierarchical feature matching
    Urano, T
    Matsui, T
    Nakata, T
    Mizoguchi, H
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6412 - 6416
  • [9] Design of A Memory-Based VLC Decoder for Portable Video Applications
    Lee, Wei-Chin
    Li, Yao
    Lee, Chen-Yi
    2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4, 2008, : 1340 - 1343
  • [10] Multi modal human action recognition for video content matching
    Guo, Jun
    Bai, Hao
    Tang, Zhanyong
    Xu, Pengfei
    Gan, Daguang
    Liu, Baoying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 34665 - 34683