AMEGO: Active Memory from Long EGOcentric Videos

被引:0
|
作者
Goletto, Gabriele [1 ]
Nagarajan, Tushar [2 ]
Averta, Giuseppe [1 ]
Damen, Dima [3 ]
机构
[1] Politecn Torino, Turin, Italy
[2] Meta, FAIR, Austin, TX USA
[3] Univ Bristol, Bristol, Avon, England
来源
COMPUTER VISION - ECCV 2024, PT XIII | 2025年 / 15071卷
基金
英国工程与自然科学研究理事会;
关键词
Long video understanding; Egocentric vision;
D O I
10.1007/978-3-031-72624-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGOfocuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We show-case improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.
引用
收藏
页码:92 / 110
页数:19
相关论文
共 38 条
  • [21] INTERACTION-GCN: A GRAPH CONVOLUTIONAL NETWORK BASED FRAMEWORK FOR SOCIAL INTERACTION RECOGNITION IN EGOCENTRIC VIDEOS
    Felicioni, Simone
    Dimiccoli, Mariella
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2348 - 2352
  • [22] Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos
    Buso, Vincent
    Gonzalez-Diaz, Ivan
    Benois-Pineau, Jenny
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2015, 39 : 418 - 431
  • [23] Recognizing Activities of Daily Living from Egocentric Images
    Cartas, Alejandro
    Marin, Juan
    Radeva, Petia
    Dimiccoli, Mariella
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 87 - 95
  • [24] Predicting the future from first person (egocentric) vision: A survey
    Rodin, Ivan
    Furnari, Antonino
    Mavroeidis, Dimitrios
    Farinella, Giovanni Maria
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 211
  • [25] Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
    Jia, Wenqi
    Liu, Miao
    Rehg, James M.
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 639 - 656
  • [26] Topic modelling for routine discovery from egocentric photo-streams
    Talavera, Estefania
    Wuerich, Carolin
    Petkov, Nicolai
    Radeva, Petia
    PATTERN RECOGNITION, 2020, 104 (104)
  • [27] Predicting Daily Activities From Egocentric Images Using Deep Learning
    Castro, Daniel
    Hickson, Steven
    Bettadapura, Vinay
    Thomaz, Edison
    Abowd, Gregory
    Christensen, Henrik
    Essa, Irfan
    ISWC 2015: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2015, : 75 - 82
  • [28] Behavioural patterns discovery for lifestyle analysis from egocentric photo-streams
    Menchon, Martin
    Talavera, Estefania
    Massa, Jose
    Radeva, Petia
    PERVASIVE AND MOBILE COMPUTING, 2023, 95
  • [29] Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
    Lye, Mohd Haris
    AlDahoul, Nouar
    Abdul Karim, Hezerul
    SENSORS, 2023, 23 (15)
  • [30] An Optimized Pipeline for Image-Based Localization in Museums from Egocentric Images
    Messina, Nicola
    Falchi, Fabrizio
    Furnari, Antonino
    Gennaro, Claudio
    Farinella, Giovanni Maria
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT I, 2023, 14233 : 512 - 524