VLMAH: Visual-Linguistic Modeling of Action History for Effective Action Anticipation

被引：1

作者：

Manousaki, Victoria ^{[1
,2
]}

Bacharidis, Konstantinos ^{[1
,2
]}

Papoutsakis, Konstantinos ^{[3
]}

Argyros, Antonis ^{[1
,2
]}

机构：

[1] Univ Crete, Dept Comp Sci, Iraklion, Greece

[2] FORTH, Inst Comp Sci, Iraklion, Greece

[3] Hellen Mediterranean Univ, Dept Management Sci & Technol, Khania, Greece

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年

关键词：

D O I：

10.1109/ICCVW60793.2023.00206

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although existing methods for action anticipation have shown considerably improved performance on the predictability of future events in videos, the way they exploit information related to past actions is constrained by time duration and encoding complexity. This paper addresses the task of action anticipation by taking into consideration the history of all executed actions throughout long, procedural activities. A novel approach noted as Visual-Linguistic Modeling of Action History (VLMAH) is proposed that fuses the immediate past in the form of visual features as well as the distant past based on a cost-effective form of linguistic constructs (semantic labels of the nouns, verbs, or actions). Our approach generates accurate near-future action predictions during procedural activities by leveraging information on the long- and short-term past. Extensive experimental evaluation was conducted on three challenging video datasets containing procedural activities, namely the Meccano, the Assembly-101, and the 50Salads. The results confirm that using long-term action history improves action anticipation and enhances the SOTA Top-1 accuracy.

引用

页码：1909 / 1919

页数：11

共 62 条

[1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Abu Farha, Yazan
Gall, Juergen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
[2] When will you do what? - Anticipating Temporal Occurrences of Activities
Abu Farha, Yazan
Richard, Alexander
Gall, Juergen
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5343 - 5352
[3] ViViT: A Video Vision Transformer
Arnab, Anurag
Dehghani, Mostafa
Heigold, Georg
Sun, Chen
Lucic, Mario
Schmid, Cordelia
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
[4] Cross-Domain Learning in Deep HAR Models via Natural Language Processing on Action Labels
Bacharidis, Konstantinos
Argyros, Antonis
[J]. ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT I, 2022, 13598 : 347 - 361
[5] Improving Deep Learning Approaches for Human Activity Recognition based on Natural Language Processing of Action Labels
Bacharidis, Konstantinos
Argyros, Antonis
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[6] The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
Ben-Shabat, Yizhak
Yu, Xin
Saleh, Fatemeh
Campbell, Dylan
Rodriguez-Opazo, Cristian
Li, Hongdong
Gould, Stephen
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 846 - 858
[7] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[8] Damen Dima, INT J COMPUTER VISIO, P1
[9] Dessalene Eadom, 2023, CVPR
[10] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210

← 1 2 3 4 5 6 7 →