A dual-stream encoder-decoder network with attention mechanism for saliency detection in video(s)

被引:2
作者
Kumain, Sandeep Chand [1 ]
Singh, Maheep [1 ]
Awasthi, Lalit Kumar [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Srinagar, Garhwal, India
关键词
Salient Object Detection (SOD); Video Salient Object Detection (VSOD); Video Summarization; Convolutional Neural Network; Attention mechanism; OPTIMIZATION; MODEL;
D O I
10.1007/s11760-023-02833-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Salient Object Detection (SOD) is a crucial task within the domain of digital image processing which aims to detect objects in images or videos that attract special human attention. These visually attentive objects are referred as salient objects in computer vision and image processing. The automatic recognition of these attention-grabbing objects holds considerable importance for various applications such as video summarization, automated cropping for compression purposes, image and video captioning, and action recognition. In the last two decades, various methods have been proposed by the research community to mimic the human visual capability to find the object(s) that receives the most attention. Early methodologies primarily relied on conventional approaches, but more recently, deep learning-based techniques have gained significant interest and popularity in the domain of salient object detection in images and videos. In this work, the authors introduce an innovative model that employs a dual-stream encoder-decoder architecture for accurate saliency estimation in videos. Integrating an attention mechanism and non-local blocks makes the network more robust, leading to improved identification of salient objects. To assess the proposed model's effectiveness, comprehensive evaluations have been conducted on well-known publicly available datasets such as VOS, DAVSOD, and ViSAL. The experimental results demonstrate that the proposed model achieves competitive performance when compared to state-of-the-art methods on S-Measure, F-Measure, and MAE performance evaluation metrics.
引用
收藏
页码:2037 / 2046
页数:10
相关论文
共 50 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Katz centrality based approach to perform human action recognition by using OMKZ [J].
Bakhat, Khush ;
Kifayat, Kashif ;
Islam, M. Shujah ;
Islam, M. Mattah .
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) :1677-1685
[3]   What Do Different Evaluation Metrics Tell Us About Saliency Models? [J].
Bylinskii, Zoya ;
Judd, Tilke ;
Oliva, Aude ;
Torralba, Antonio ;
Durand, Fredo .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) :740-757
[4]   SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection [J].
Chen, Yuhuan ;
Zou, Wenbin ;
Tang, Yi ;
Li, Xia ;
Xu, Chen ;
Komodakis, Nikos .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) :3345-3357
[5]  
classeval, Basic evaluation metrics
[6]  
cse, ECSSD dataset
[7]  
cvteam, VOS dataset
[8]  
davischallenge, DAVIS 2016
[9]   Shifting More Attention to Video Salient Object Detection [J].
Fan, Deng-Ping ;
Wang, Wenguan ;
Cheng, Ming-Ming ;
Shen, Jianbing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8546-8556
[10]   Video saliency detection by gestalt theory [J].
Fang, Yuming ;
Zhang, Xiaoqiang ;
Yuan, Feiniu ;
Imamoglu, Nevrez ;
Liu, Haiwen .
PATTERN RECOGNITION, 2019, 96