AMEGO: Active Memory from Long EGOcentric Videos

被引：0

作者：

Goletto, Gabriele ^{[1
]}

Nagarajan, Tushar ^{[2
]}

Averta, Giuseppe ^{[1
]}

Damen, Dima ^{[3
]}

机构：

[1] Politecn Torino, Turin, Italy

[2] Meta, FAIR, Austin, TX USA

[3] Univ Bristol, Bristol, Avon, England

来源：

COMPUTER VISION - ECCV 2024, PT XIII | 2025年 / 15071卷

基金：

英国工程与自然科学研究理事会;

关键词：

Long video understanding; Egocentric vision;

D O I：

10.1007/978-3-031-72624-8_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGOfocuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We show-case improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.

引用

页码：92 / 110

页数：19

共 38 条

[21] INTERACTION-GCN: A GRAPH CONVOLUTIONAL NETWORK BASED FRAMEWORK FOR SOCIAL INTERACTION RECOGNITION IN EGOCENTRIC VIDEOS
Felicioni, Simone
Dimiccoli, Mariella
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2348 - 2352
[22] Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos
Buso, Vincent
Gonzalez-Diaz, Ivan
Benois-Pineau, Jenny
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2015, 39 : 418 - 431
[23] Recognizing Activities of Daily Living from Egocentric Images
Cartas, Alejandro
Marin, Juan
Radeva, Petia
Dimiccoli, Mariella
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 87 - 95
[24] Predicting the future from first person (egocentric) vision: A survey
Rodin, Ivan
Furnari, Antonino
Mavroeidis, Dimitrios
Farinella, Giovanni Maria
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 211
[25] Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
Jia, Wenqi
Liu, Miao
Rehg, James M.
COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 639 - 656
[26] Topic modelling for routine discovery from egocentric photo-streams
Talavera, Estefania
Wuerich, Carolin
Petkov, Nicolai
Radeva, Petia
PATTERN RECOGNITION, 2020, 104 (104)
[27] Predicting Daily Activities From Egocentric Images Using Deep Learning
Castro, Daniel
Hickson, Steven
Bettadapura, Vinay
Thomaz, Edison
Abowd, Gregory
Christensen, Henrik
Essa, Irfan
ISWC 2015: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2015, : 75 - 82
[28] Behavioural patterns discovery for lifestyle analysis from egocentric photo-streams
Menchon, Martin
Talavera, Estefania
Massa, Jose
Radeva, Petia
PERVASIVE AND MOBILE COMPUTING, 2023, 95
[29] Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
Lye, Mohd Haris
AlDahoul, Nouar
Abdul Karim, Hezerul
SENSORS, 2023, 23 (15)
[30] An Optimized Pipeline for Image-Based Localization in Museums from Egocentric Images
Messina, Nicola
Falchi, Fabrizio
Furnari, Antonino
Gennaro, Claudio
Farinella, Giovanni Maria
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT I, 2023, 14233 : 512 - 524

← 1 2 3 4 →