Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

被引：3

作者：

Mazzamuto, Michele ^{[1
]}

Ragusa, Francesco ^{[1
,2
]}

Furnari, Antonino ^{[1
,2
]}

Signorello, Giovanni ^{[3
]}

Farinella, Giovanni Maria ^{[1
,2
,3
]}

机构：

[1] Univ Catania, DMI, FPV IPLAB, Catania, Italy

[2] Univ Catania, Next Vis Srl Spinoff, Catania, Italy

[3] Univ Catania, CUTGANA, Catania, Italy

来源：

IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II | 2022年 / 13232卷

关键词：

Egocentric vision; Weakly supervised object detection;

D O I：

10.1007/978-3-031-06430-2_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ.DET/.

引用

页码：263 / 274

页数：12

共 36 条

[1] What's the Point: Semantic Segmentation with Point Supervision [J].

Bearman, Amy ;

Russakovsky, Olga ;

Ferrari, Vittorio ;

Fei-Fei, Li .

COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :549-565

[2] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[3]

Cheng BW, 2022, Arxiv, DOI arXiv:2104.06404

[4]

Chiaro R.D., 2019, INT C COMPUTER VISIO

[5] Learning Hierarchical Features for Scene Labeling [J].

Farabet, Clement ;

Couprie, Camille ;

Najman, Laurent ;

LeCun, Yann .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1915-1929

[6]

Furnari Antonino, 2016, Computer Vision - ECCV 2016. 14th European Conference: Workshops. Proceedings: LNCS 9913, P474, DOI 10.1007/978-3-319-46604-0_34

[7] Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video [J].

Furnari, Antonino ;

Farinella, Giovanni Maria .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) :4021-4036

[8]

Garcia Noa, 2020, Computer Vision - ECCV 2020 Workshops. Proceedings. Lecture Notes in Computer Science (LNCS 12536), P92, DOI 10.1007/978-3-030-66096-3_8

[9] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[10] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

← 1 2 3 4 →