Salient object detection in egocentric videos

被引:1
作者
Zhang, Hao [1 ]
Liang, Haoran [1 ]
Zhao, Xing [1 ]
Liu, Jian [1 ]
Liang, Ronghua [1 ]
机构
[1] Zhejiang Univ Technol, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
image processing; object detection; SEGMENTATION; TRACKING;
D O I
10.1049/ipr2.13080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the realm of video salient object detection (VSOD), the majority of research has traditionally been centered on third-person perspective videos. However, this focus overlooks the unique requirements of certain first-person tasks, such as autonomous driving or robot vision. To bridge this gap, a novel dataset and a camera-based VSOD model, CaMSD, specifically designed for egocentric videos, is introduced. First, the SalEgo dataset, comprising 17,400 fully annotated frames for video salient object detection, is presented. Second, a computational model that incorporates a camera movement module is proposed, designed to emulate the patterns observed when humans view videos. Additionally, to achieve precise segmentation of a single salient object during switches between salient objects, as opposed to simultaneously segmenting two objects, a saliency enhancement module based on the Squeeze and Excitation Block is incorporated. Experimental results show that the approach outperforms other state-of-the-art methods in egocentric video salient object detection tasks. Dataset and codes can be found at . We propose a new egocentric video salient object detection (VSOD) dataset SalEgo. And we propose a new Camera Movement based method CaMSD for the new dataset and compare to some models. Experimental results show that our approach outperforms other state-of-the-art methods in egocentric video salient object detection tasks. image
引用
收藏
页码:2028 / 2037
页数:10
相关论文
共 58 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Spatiotemporal Saliency Estimation by Spectral Foreground Detection
    Aytekin, Caglar
    Possegger, Horst
    Mauthner, Thomas
    Kiranyaz, Serkan
    Bischof, Horst
    Gabbouj, Moncef
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) : 82 - 95
  • [3] My View is the Best View: Procedure Learning from Egocentric Videos
    Bansal, Siddhant
    Arora, Chetan
    Jawahar, C. V.
    [J]. COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 657 - 675
  • [4] Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21
  • [5] Cai MJ, 2016, ROBOTICS: SCIENCE AND SYSTEMS XII
  • [6] Cheng Ho Kei, 2022, ECCV
  • [7] Global Contrast based Salient Region Detection
    Cheng, Ming-Ming
    Zhang, Guo-Xin
    Mitra, Niloy J.
    Huang, Xiaolei
    Hu, Shi-Min
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 409 - 416
  • [8] Damen D., 2020, RESCALING EGOCENTRIC
  • [9] Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
    Damen, Dima
    Doughty, Hazel
    Farinella, Giovanni Maria
    Fidler, Sanja
    Furnari, Antonino
    Kazakos, Evangelos
    Moltisanti, Davide
    Munro, Jonathan
    Perrett, Toby
    Price, Will
    Wray, Michael
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 753 - 771
  • [10] Summarization of Egocentric Videos: A Comprehensive Survey
    del Molino, Ana Garcia
    Tan, Cheston
    Lim, Joo-Hwee
    Tan, Ah-Hwee
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2017, 47 (01) : 65 - 76