Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

被引:3
作者
Mazzamuto, Michele [1 ]
Ragusa, Francesco [1 ,2 ]
Furnari, Antonino [1 ,2 ]
Signorello, Giovanni [3 ]
Farinella, Giovanni Maria [1 ,2 ,3 ]
机构
[1] Univ Catania, DMI, FPV IPLAB, Catania, Italy
[2] Univ Catania, Next Vis Srl Spinoff, Catania, Italy
[3] Univ Catania, CUTGANA, Catania, Italy
来源
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II | 2022年 / 13232卷
关键词
Egocentric vision; Weakly supervised object detection;
D O I
10.1007/978-3-031-06430-2_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ.DET/.
引用
收藏
页码:263 / 274
页数:12
相关论文
共 36 条
[1]   What's the Point: Semantic Segmentation with Point Supervision [J].
Bearman, Amy ;
Russakovsky, Olga ;
Ferrari, Vittorio ;
Fei-Fei, Li .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :549-565
[2]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[3]  
Cheng BW, 2022, Arxiv, DOI arXiv:2104.06404
[4]  
Chiaro R.D., 2019, INT C COMPUTER VISIO
[5]   Learning Hierarchical Features for Scene Labeling [J].
Farabet, Clement ;
Couprie, Camille ;
Najman, Laurent ;
LeCun, Yann .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1915-1929
[6]  
Furnari Antonino, 2016, Computer Vision - ECCV 2016. 14th European Conference: Workshops. Proceedings: LNCS 9913, P474, DOI 10.1007/978-3-319-46604-0_34
[7]   Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video [J].
Furnari, Antonino ;
Farinella, Giovanni Maria .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) :4021-4036
[8]  
Garcia Noa, 2020, Computer Vision - ECCV 2020 Workshops. Proceedings. Lecture Notes in Computer Science (LNCS 12536), P92, DOI 10.1007/978-3-030-66096-3_8
[9]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[10]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587