Temporal Query Networks for Fine-grained Video Understanding

被引:50
|
作者
Zhang, Chuhan [1 ]
Gupta, Ankush [2 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Oxford, England
[2] DeepMind, London, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set. We make the following four contributions: (i) We propose a new model-a Temporal Query Network-which enables the query-response functionality, and a structural understanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way-stochastic feature bank update-to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features. Project page: https://www.robots.ox.ac.uk/-vgg/research/tqn/.
引用
收藏
页码:4484 / 4494
页数:11
相关论文
共 50 条
  • [31] FIVR: Fine-Grained Incident Video Retrieval
    Kordopatis-Zilos, Giorgos
    Papadopoulos, Symeon
    Patras, Ioannis
    Kompatsiaris, Ioannis
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2638 - 2652
  • [32] Streaming fine-grained scalable video over packet-based networks
    Cohen, R
    Radha, H
    GLOBECOM '00: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1- 3, 2000, : 288 - 292
  • [33] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [34] Fine-grained traffic video vehicle recognition based orientation estimation and temporal information
    Anqi Hu
    Zhengxing Sun
    Qian Li
    Yechao Xu
    Yihuan Zhu
    Sheng Zhang
    Multimedia Tools and Applications, 2023, 82 : 13745 - 13763
  • [35] Fine-grained Geolocation of Tweets in Temporal Proximity
    Chong, Wen-Haw
    Lim, Ee-Peng
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (02)
  • [36] Understanding Objects in Detail with Fine-grained Attributes
    Vedaldi, Andrea
    Mahendran, Siddharth
    Tsogkas, Stavros
    Maji, Subhransu
    Girshick, Ross
    Kannala, Juho
    Rahtu, Esa
    Kokkinos, Iasonas
    Blaschko, Matthew B.
    Weiss, David
    Taskar, Ben
    Simonyan, Karen
    Saphra, Naomi
    Mohamed, Sammy
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3622 - 3629
  • [37] Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition
    Ramesh, Sanat
    Dall'Alba, Diego
    Gonzalez, Cristians
    Yu, Tong
    Mascagni, Pietro
    Mutter, Didier
    Marescaux, Jacques
    Fiorini, Paolo
    Padoy, Nicolas
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (09) : 2592 - 2602
  • [38] Fine-grained traffic video vehicle recognition based orientation estimation and temporal information
    Hu, Anqi
    Sun, Zhengxing
    Li, Qian
    Xu, Yechao
    Zhu, Yihuan
    Zhang, Sheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) : 13745 - 13763
  • [39] Online video advertising based on fine-grained video tags
    Lu, Feng
    Wang, Zirui
    Liao, Xiaofei
    Jin, Hai
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2014, 51 (12): : 2733 - 2745
  • [40] Query-guided networks for few-shot fine-grained classification and person search
    Munjal, Bharti
    Flaborea, Alessandro
    Amin, Sikandar
    Tombari, Federico
    Galasso, Fabio
    PATTERN RECOGNITION, 2023, 133