VLMAH: Visual-Linguistic Modeling of Action History for Effective Action Anticipation

被引:1
作者
Manousaki, Victoria [1 ,2 ]
Bacharidis, Konstantinos [1 ,2 ]
Papoutsakis, Konstantinos [3 ]
Argyros, Antonis [1 ,2 ]
机构
[1] Univ Crete, Dept Comp Sci, Iraklion, Greece
[2] FORTH, Inst Comp Sci, Iraklion, Greece
[3] Hellen Mediterranean Univ, Dept Management Sci & Technol, Khania, Greece
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年
关键词
D O I
10.1109/ICCVW60793.2023.00206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although existing methods for action anticipation have shown considerably improved performance on the predictability of future events in videos, the way they exploit information related to past actions is constrained by time duration and encoding complexity. This paper addresses the task of action anticipation by taking into consideration the history of all executed actions throughout long, procedural activities. A novel approach noted as Visual-Linguistic Modeling of Action History (VLMAH) is proposed that fuses the immediate past in the form of visual features as well as the distant past based on a cost-effective form of linguistic constructs (semantic labels of the nouns, verbs, or actions). Our approach generates accurate near-future action predictions during procedural activities by leveraging information on the long- and short-term past. Extensive experimental evaluation was conducted on three challenging video datasets containing procedural activities, namely the Meccano, the Assembly-101, and the 50Salads. The results confirm that using long-term action history improves action anticipation and enhances the SOTA Top-1 accuracy.
引用
收藏
页码:1909 / 1919
页数:11
相关论文
共 62 条
  • [1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
    Abu Farha, Yazan
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
  • [2] When will you do what? - Anticipating Temporal Occurrences of Activities
    Abu Farha, Yazan
    Richard, Alexander
    Gall, Juergen
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5343 - 5352
  • [3] ViViT: A Video Vision Transformer
    Arnab, Anurag
    Dehghani, Mostafa
    Heigold, Georg
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
  • [4] Cross-Domain Learning in Deep HAR Models via Natural Language Processing on Action Labels
    Bacharidis, Konstantinos
    Argyros, Antonis
    [J]. ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT I, 2022, 13598 : 347 - 361
  • [5] Improving Deep Learning Approaches for Human Activity Recognition based on Natural Language Processing of Action Labels
    Bacharidis, Konstantinos
    Argyros, Antonis
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
    Ben-Shabat, Yizhak
    Yu, Xin
    Saleh, Fatemeh
    Campbell, Dylan
    Rodriguez-Opazo, Cristian
    Li, Hongdong
    Gould, Stephen
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 846 - 858
  • [7] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [8] Damen Dima, INT J COMPUTER VISIO, P1
  • [9] Dessalene Eadom, 2023, CVPR
  • [10] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210