HIGHER-ORDER RECURRENT NETWORK WITH SPACE-TIME ATTENTION FOR VIDEO EARLY ACTION RECOGNITION

被引:1
|
作者
Tai, Tsung-Ming [1 ,2 ]
Fiameni, Giuseppe [1 ]
Lee, Cheng-Kuang [1 ]
Lanz, Oswald [2 ]
机构
[1] NVIDIA AI Technol Ctr, Taipei, Taiwan
[2] Free Univ Bozen Bolzano, Bolzano, Italy
来源
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2022年
关键词
Video prediction; early action recognition; higher-order recurrent networks; space-time attention;
D O I
10.1109/ICIP46576.2022.9897974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Endowing visual agents with predictive capability is a key step towards video intelligence at scale. Early action recognition aims to predict the action labels before fully observing the complete video frames. Unlike action recognition, the model is asked to forecast the future or the effects by only observing the initial few frames. The strong reasoning ability over the temporal dimension is the key to success. To this end, in this paper, we propose a novel recurrent network with decomposed space-time attention and higher-order design to capture the temporal dependency associated with the specific actions. Our method achieves state-of-the-art performance on Something-Something and EPIC-Kitchens datasets under the early action recognition setting, showing evidence of predictive capability that we attribute to our higher-order recurrent design with space-time attention.
引用
收藏
页码:1631 / 1635
页数:5
相关论文
共 1 条
  • [1] Video summarization network based on Space-Time attention and genetic algorithm optimization
    Ao, Naixiang
    Shi, Fucheng
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 420 - 425