Egocentric action anticipation from untrimmed videos

被引:0
|
作者
Rodin, Ivan [1 ]
Furnari, Antonino [1 ,2 ]
Farinella, Giovanni Maria [1 ,2 ]
机构
[1] Univ Catania, Catania, Italy
[2] Univ Catania, Next Vis srl Spinoff, Catania, Italy
关键词
computer vision; pattern recognition;
D O I
10.1049/cvi2.12342
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are 'trimmed', meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with 'untrimmed' video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Augmenting aerial earth maps with dynamic information from videos
    Kihwan Kim
    Sangmin Oh
    Jeonggyu Lee
    Irfan Essa
    Virtual Reality, 2011, 15 : 185 - 200
  • [32] Automatic Count of Bites and Chews From Videos of Eating Episodes
    Hossain, Delwar
    Ghosh, Tonmoy
    Sazonov, Edward
    IEEE ACCESS, 2020, 8 : 101934 - 101945
  • [33] Augmenting aerial earth maps with dynamic information from videos
    Kim, Kihwan
    Oh, Sangmin
    Lee, Jeonggyu
    Essa, Irfan
    VIRTUAL REALITY, 2011, 15 (2-3) : 185 - 200
  • [34] Fish identification from videos captured in uncontrolled underwater environments
    Shafait, Faisal
    Mian, Ajmal
    Shortis, Mark
    Ghanem, Bernard
    Culverhouse, Phil F.
    Edgington, Duane
    Cline, Danelle
    Ravanbakhsh, Mehdi
    Seager, James
    Harvey, Euan S.
    ICES JOURNAL OF MARINE SCIENCE, 2016, 73 (10) : 2737 - 2746
  • [35] Violence Detection From Industrial Surveillance Videos Using Deep Learning
    Khan, Hamza
    Yuan, Xiaohong
    Qingge, Letu
    Roy, Kaushik
    IEEE ACCESS, 2025, 13 : 15363 - 15375
  • [36] Unsupervised Deep Learning to Detect Agitation From Videos in People With Dementia
    Khan, Shehroz S.
    Mishra, Pratik K.
    Javed, Nizwa
    Ye, Bing
    Newman, Kristine
    Mihailidis, Alex
    Iaboni, Andrea
    IEEE ACCESS, 2022, 10 : 10349 - 10358
  • [37] VidSfM: Robust and Accurate Structure-From-Motion for Monocular Videos
    Cui, Hainan
    Tu, Diantao
    Tang, Fulin
    Xu, Pengfei
    Liu, Hongmin
    Shen, Shuhan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2449 - 2462
  • [38] EventHDR: From Event to High-Speed HDR Videos and Beyond
    Zou, Yunhao
    Fu, Ying
    Takatani, Tsuyoshi
    Zheng, Yinqiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 32 - 50
  • [39] People Counting from Moving Camera Videos through PeopleNet Framework
    Ankit Tomar
    Santosh Kumar
    Kamal Kant Verma
    SN Computer Science, 5 (8)
  • [40] Person Re-identification from Videos Using Facial Features
    Hendre, Ankit
    Charniya, Nadir N.
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 380 - 387