Egocentric action anticipation from untrimmed videos

被引:0
|
作者
Rodin, Ivan [1 ]
Furnari, Antonino [1 ,2 ]
Farinella, Giovanni Maria [1 ,2 ]
机构
[1] Univ Catania, Catania, Italy
[2] Univ Catania, Next Vis srl Spinoff, Catania, Italy
关键词
computer vision; pattern recognition;
D O I
10.1049/cvi2.12342
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are 'trimmed', meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with 'untrimmed' video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Temporally enhanced graph convolutional network for hand tracking from an egocentric camera
    Cho, Woojin
    Ha, Taewook
    Jeon, Ikbeom
    Jeon, Jinwoo
    Kim, Tae-Kyun
    Woo, Woontack
    VIRTUAL REALITY, 2024, 28 (03)
  • [22] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    IET COMPUTER VISION, 2019, 13 (08) : 708 - 718
  • [23] An Unsupervised Method for Anomaly Detection from Crowd Videos
    Guler, Puren
    Temizel, Alptekin
    Temizel, Tugba Taskaya
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [24] SFV: Reinforcement Learning of Physical Skills from Videos
    Peng, Xue Bin
    Kanazawa, Angjoo
    Malik, Jitendra
    Abbeel, Pieter
    Levine, Sergey
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (06):
  • [25] SFV: Reinforcement Learning of Physical Skills from Videos
    Peng, Xue Bin
    Kanazawa, Angjoo
    Malik, Jitendra
    Abbeel, Pieter
    Levine, Sergey
    SIGGRAPH ASIA'18: SIGGRAPH ASIA 2018 TECHNICAL PAPERS, 2018,
  • [26] Multimodal Engagement Analysis From Facial Videos in the Classroom
    Sumer, Omer
    Goldberg, Patricia
    DMello, Sidney
    Gerjets, Peter
    Trautwein, Ulrich
    Kasneci, Enkelejda
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1012 - 1027
  • [27] Modeling, Recognizing, and Explaining Apparent Personality From Videos
    Jair Escalante, Hugo
    Kaya, Heysem
    Salah, Albert Ali
    Escalera, Sergio
    Gucluturk, Yagmur
    Guclu, Umut
    Baro, Xavier
    Guyon, Isabelle
    Jacques, Julio C. S., Jr.
    Madadi, Meysam
    Ayache, Stephane
    Viegas, Evelyne
    Gurpinar, Furkan
    Wicaksana, Achmadnoer Sukma
    Liem, Cynthia C. S.
    van Gerven, Marcel A. J.
    van Lier, Rob
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 894 - 911
  • [28] Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos
    Zeng, Ling-An
    Hong, Fa-Ting
    Zheng, Wei-Shi
    Yu, Qi-Zhi
    Zeng, Wei
    Wang, Yao-Wei
    Lai, Jian-Huang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2526 - 2534
  • [29] Recent Trends in Person Re-identification from Videos
    Hendre, Ankit R.
    Charniya, Nadir N.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 515 - 520
  • [30] A recurrent approach for predicting Parkinson stage from multimodal videos
    Archila, John
    Manzanera, Antoine
    Martinez, Fabio
    17TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2021, 12088