Temporal Query Networks for Fine-grained Video Understanding

被引:50
|
作者
Zhang, Chuhan [1 ]
Gupta, Ankush [2 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Oxford, England
[2] DeepMind, London, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set. We make the following four contributions: (i) We propose a new model-a Temporal Query Network-which enables the query-response functionality, and a structural understanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way-stochastic feature bank update-to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features. Project page: https://www.robots.ox.ac.uk/-vgg/research/tqn/.
引用
收藏
页码:4484 / 4494
页数:11
相关论文
共 50 条
  • [21] Fine-grained visual understanding and reasoning
    Yu, Jun
    Yang, Yezhou
    Murtagh, Fionn
    Gao, Xinbo
    NEUROCOMPUTING, 2020, 398 (398) : 408 - 410
  • [22] Fine-Grained Text-to-Video Temporal Grounding from Coarse Boundary
    Hao, Jiachang
    Sun, Haifeng
    Ren, Pengfei
    Zhong, Yiming
    Wang, Jingyu
    Qi, Qi
    Liao, Jianxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)
  • [23] FINE-GRAINED POSE TEMPORAL MEMORY MODULE FOR VIDEO POSE ESTIMATION AND TRACKING
    Wang, Chaoyi
    Hua, Yang
    Song, Tao
    Xue, Zhengui
    Ma, Ruhui
    Robertson, Neil
    Guan, Haibing
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2205 - 2209
  • [24] Construct and Query A Fine-Grained Geospatial Knowledge Graph
    Wei, Bo
    Guo, Xi
    Li, Xiaodi
    Wu, Ziyan
    Zhao, Jing
    Zou, Qiping
    DATA SCIENCE AND ENGINEERING, 2024, 9 (02) : 152 - 176
  • [25] Enhanced Query Classification with Millions of Fine-Grained Topics
    Ye, Qi
    Wang, Feng
    Li, Bo
    Liu, Zhimin
    WEB-AGE INFORMATION MANAGEMENT, PT II, 2016, 9659 : 120 - 131
  • [26] Query-Adaptive Late Fusion for Hierarchical Fine-Grained Video-Text Retrieval
    Ma, Wentao
    Chen, Qingchao
    Liu, Fang
    Zhou, Tongqing
    Cai, Zhiping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 7150 - 7161
  • [27] Fine-Grained Video Retrieval With Scene Sketches
    Zuo, Ran
    Deng, Xiaoming
    Chen, Keqi
    Zhang, Zhengming
    Lai, Yu-Kun
    Liu, Fang
    Ma, Cuixia
    Wang, Hao
    Liu, Yong-Jin
    Wang, Hongan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3136 - 3149
  • [28] Favor: Fine-Grained Video Rate Adaptation
    He, Jian
    Qureshi, Mubashir Adnan
    Qiu, Lili
    Li, Jin
    Li, Feng
    Han, Lei
    PROCEEDINGS OF THE 9TH ACM MULTIMEDIA SYSTEMS CONFERENCE (MMSYS'18), 2018, : 64 - 75
  • [29] Fine-grained Video Captioning for Sports Narrative
    Yu, Huanyu
    Cheng, Shuo
    Ni, Bingbing
    Wang, Minsi
    Zhang, Jian
    Yang, Xiaokang
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6006 - 6015
  • [30] A Fine-Grained Video Traffic Control Mechanism in Software-Defined Networks
    Huang, Jun
    Duan, Qiang
    Xing, Cong-Cong
    Gu, Bo
    Wang, Guodong
    Zeadally, Sherali
    Baker, Erich
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (03): : 3501 - 3515