Temporal Query Networks for Fine-grained Video Understanding

被引:50
|
作者
Zhang, Chuhan [1 ]
Gupta, Ankush [2 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Oxford, England
[2] DeepMind, London, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set. We make the following four contributions: (i) We propose a new model-a Temporal Query Network-which enables the query-response functionality, and a structural understanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way-stochastic feature bank update-to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features. Project page: https://www.robots.ox.ac.uk/-vgg/research/tqn/.
引用
收藏
页码:4484 / 4494
页数:11
相关论文
共 50 条
  • [41] Enabling Fine-Grained HTTP Caching of SPARQL Query Results
    Williams, Gregory Todd
    Weaver, Jesse
    SEMANTIC WEB - ISWC 2011, PT I, 2011, 7031 : 762 - 777
  • [42] Smart routing: Fine-grained stall management of video streams in mobile core networks
    He, Jun
    Song, Wei
    COMPUTER NETWORKS, 2015, 85 : 51 - 62
  • [43] Fine-grained scalable video caching for heterogeneous clients
    Liu, Jiangchuan
    Xu, Jianliang
    Chu, Xiaowen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2006, 8 (05) : 1011 - 1020
  • [44] Fine-grained talking face generation with video reinterpretation
    Huang, Xin
    Wang, Mingjie
    Gong, Minglun
    VISUAL COMPUTER, 2021, 37 (01): : 95 - 105
  • [45] Spotting Temporally Precise, Fine-Grained Events in Video
    Hong, James
    Zhang, Haotian
    Gharbi, Michael
    Fisher, Matthew
    Fatahalian, Kayvon
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 33 - 51
  • [46] Fine-Grained Dissection of WeChat in Cellular Networks
    Huang, Qun
    Lee, Patrick P. C.
    He, Caifeng
    Qian, Jianfeng
    He, Cheng
    2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2015, : 309 - 318
  • [47] Fine-Grained Video Categorization with Redundancy Reduction Attention
    Zhu, Chen
    Tan, Xiao
    Zhou, Feng
    Liu, Xiao
    Yue, Kaiyu
    Ding, Errui
    Ma, Yi
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 139 - 155
  • [48] Fine-grained talking face generation with video reinterpretation
    Xin Huang
    Mingjie Wang
    Minglun Gong
    The Visual Computer, 2021, 37 : 95 - 105
  • [49] Fine-Grained Motion Estimation for Video Frame Interpolation
    Yan, Bo
    Tan, Weimin
    Lin, Chuming
    Shen, Liquan
    IEEE TRANSACTIONS ON BROADCASTING, 2021, 67 (01) : 174 - 184
  • [50] Fine-grained Optimization of Deep Neural Networks
    Ozay, Mete
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32