Matching Video Net: Memory-based embedding for video action recognition

被引:0
|
作者
Kim, Daesik [1 ]
Lee, Myunggi [1 ]
Kwak, Nojun [1 ]
机构
[1] Seoul Natl Univ, Grad Sch Convergence Sci & Technol, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of recent successful researches on action recognition are based on deep learning structures. Nonetheless, training deep neural networks is notorious for requiring huge amount of data. On the other hand, not enough data can lead to an overfitted model. In this work, we propose a novel model, matching video net (MVN), which can be trained with a small amount of data. In order to avoid the problem of overfitting, we use a non-parametric setup on top of parametric networks with external memories. An input clip of video is transformed into an embedding space and matched to the memorized samples in the embedding space. Then, the similarities between the input and the memorized data are measured to determine the nearest neighbors. We perform experiments in a supervised manner on action recognition datasets, achieving state-of-the-art results. Moreover, we applied our model to one-shot learning problems with a novel training strategy. Our model achieves surprisingly good results in predicting unseen action classes from only a few examples.
引用
收藏
页码:432 / 438
页数:7
相关论文
共 50 条
  • [31] Badminton video action recognition based on time network
    Zhi, Juncai
    Sun, Zijie
    Zhang, Ruijie
    Zhao, Zhouxiang
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2739 - 2752
  • [32] Deep Moving Poselets for Video Based Action Recognition
    Mavroudi, Effrosyni
    Tao, Lingling
    Vidal, Rene
    2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, : 111 - 120
  • [33] OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
    Jin, Dongkwon
    Kim, Chang-Su
    COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 129 - 145
  • [34] Few-Shot Learning of Video Action Recognition Only Based on Video Contents
    Bo, Yang
    Lu, Yangdi
    He, Wenbo
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 584 - 593
  • [35] LAE-Net: Light and Efficient Network for Compressed Video Action Recognition
    Guo, Jinxin
    Zhang, Jiaqiang
    Zhang, Xiaojing
    Ma, Ming
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 265 - 276
  • [36] MV-TON: Memory-based Video Virtual Try-on network
    Zhong, Xiaojing
    Wu, Zhonghua
    Tan, Taizhe
    Lin, Guosheng
    Wu, Qingyao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 908 - 916
  • [37] Memory-based gradient-guided progressive propagation network for video deblurring
    Song, Gusu
    Gai, Shaoyan
    Da, Feipeng
    VISUAL COMPUTER, 2025, 41 (01): : 25 - 40
  • [38] MT-Net: Fast video instance lane detection based on space time memory and template matching
    Shi, Peicheng
    Zhang, Chenghui
    Xu, Shucai
    Qi, Heng
    Chen, Xinhe
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
  • [39] A Method of Simultaneously Action Recognition and Video Segmentation of Video Streams
    Ji, Liang
    Xiong, Rong
    Wang, Yue
    Yu, Hongsheng
    2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 1515 - 1520
  • [40] VIDEO STEGANOGRAPHY BASED ON EMBEDDING THE VIDEO USING PCF TECHNIQUE
    Rajalakshmi, K.
    Mahesh, K.
    2017 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2017,