Egocentric action anticipation from untrimmed videos

被引:0
|
作者
Rodin, Ivan [1 ]
Furnari, Antonino [1 ,2 ]
Farinella, Giovanni Maria [1 ,2 ]
机构
[1] Univ Catania, Catania, Italy
[2] Univ Catania, Next Vis srl Spinoff, Catania, Italy
关键词
computer vision; pattern recognition;
D O I
10.1049/cvi2.12342
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are 'trimmed', meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with 'untrimmed' video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Supervised classification of bradykinesia in Parkinson's disease from smartphone videos
    Williams, Stefan
    Relton, Samuel D.
    Fang, Hui
    Alty, Jane
    Qahwaji, Rami
    Graham, Christopher D.
    Wong, David C.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 110 (110)
  • [42] Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos
    Junbin Zhang
    Pei-Hsuan Tsai
    Meng-Hsun Tsai
    Applied Intelligence, 2024, 54 : 2084 - 2099
  • [43] Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos
    Zhang, Junbin
    Tsai, Pei-Hsuan
    Tsai, Meng-Hsun
    APPLIED INTELLIGENCE, 2024, 54 (02) : 2084 - 2099
  • [44] Action recognition using fast HOG3D of integral videos and Smith-Waterman partial matching
    El-Henawy, Ibrahim
    Ahmed, Kareem
    Mahmoud, Hamdi
    IET IMAGE PROCESSING, 2018, 12 (06) : 896 - 908
  • [45] Hybrid Features and Deep Learning Model for Facial Expression Recognition From Videos
    Gavade, Priyanka A.
    Bhat, Vandana S.
    Pujari, Jagadeesh
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (05)
  • [46] Rule-based systems to automatically count bites from meal videos
    Tufano, Michele
    Lasschuijt, Marlou P.
    Chauhan, Aneesh
    Feskens, Edith J. M.
    Camps, Guido
    FRONTIERS IN NUTRITION, 2024, 11
  • [47] Learning multiview deep features from skeletal sign language videos for recognition
    Shaik, Ashraf Ali
    Mareedu, Venkata Durga Prasad
    Polurie, Venkata Vijaya Kishore
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 (02) : 1061 - 1076
  • [48] Pixel-wise structural motion tracking from rectified repurposed videos
    Khaloo, Ali
    Lattanzi, David
    STRUCTURAL CONTROL & HEALTH MONITORING, 2017, 24 (11):
  • [49] Supervised classification of bradykinesia for Parkinson's disease diagnosis from smartphone videos
    Wong, David C.
    Relton, Samuel D.
    Fang, Hui
    Qhawaji, Rami
    Graham, Christopher D.
    Alty, Jane
    Williams, Stefan
    2019 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2019, : 32 - 37
  • [50] STFormer: Spatio-temporal former for hand-object interaction recognition from egocentric RGB video
    Liang, Jiao
    Wang, Xihan
    Yang, Jiayi
    Gao, Quanli
    ELECTRONICS LETTERS, 2024, 60 (17)