Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

被引:215
作者
Damen, Dima [1 ]
Doughty, Hazel [1 ,3 ]
Farinella, Giovanni Maria [2 ]
Furnari, Antonino [2 ]
Kazakos, Evangelos [1 ]
Ma, Jian [1 ]
Moltisanti, Davide [1 ,4 ]
Munro, Jonathan [1 ]
Perrett, Toby [1 ]
Price, Will [1 ]
Wray, Michael [1 ]
机构
[1] Univ Bristol, Bristol, Avon, England
[2] Univ Catania, Catania, Italy
[3] Univ Amsterdam, Amsterdam, Netherlands
[4] Nanyang Technol Univ, Singapore, Singapore
基金
英国工程与自然科学研究理事会;
关键词
Video dataset; Egocentric vision; First-person vision; Action understanding; Multi-benchmark large-scale dataset; Annotation quality; DOMAIN ADAPTATION;
D O I
10.1007/s11263-021-01531-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (Damen in Scaling egocentric vision: ECCV, 2018), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the "test of time"-i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.
引用
收藏
页码:33 / 55
页数:23
相关论文
共 129 条
[1]  
[Anonymous], 2016, HUMAN ACTION LOCALIZ
[2]  
[Anonymous], 2008, Guide to the carnegie mellon university multimodal activity (cmummac) database
[3]   What's the Point: Semantic Segmentation with Point Supervision [J].
Bearman, Amy ;
Russakovsky, Olga ;
Ferrari, Vittorio ;
Fei-Fei, Li .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :549-565
[4]  
Bhattacharyya A., 2019, ICLR
[5]  
Bojanowski P, 2014, LECT NOTES COMPUT SC, V8693, P628, DOI 10.1007/978-3-319-10602-1_41
[6]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[7]  
Caesar H., P IEEE CVF C COMP VI
[8]  
Cao Y, 2017, BMVC
[9]  
Caputo B., 2014, Lect. Notes Comput. Sci., P192
[10]   University of Michigan North Campus long-term vision and lidar dataset [J].
Carlevaris-Bianco, Nicholas ;
Ushani, Arash K. ;
Eustice, Ryan M. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (09) :1023-1035