CASNet: A Cross-Attention Siamese Network for Video Salient Object Detection

被引：93

作者：

Ji, Yuzhu ^{[1
]}

Zhang, Haijun ^{[1
]}

Jie, Zequn ^{[2
]}

Ma, Lin ^{[2
]}

Wu, Q. M. Jonathan ^{[3
]}

机构：

[1] Harbin Inst Technol, Shenzhen 518055, Peoples R China

[2] Tencent AI Lab, Shenzhen 518057, Peoples R China

[3] Univ Windsor, Dept Elect & Comp Engn, Windsor, ON N9B 3P4, Canada

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Object detection; Data models; Saliency detection; Feature extraction; Object oriented modeling; Computational modeling; Optical imaging; Cross attention; inter and intraframe saliency; salient object; video saliency; SEGMENTATION; OPTIMIZATION; FUSION;

D O I：

10.1109/TNNLS.2020.3007534

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent works on video salient object detection have demonstrated that directly transferring the generalization ability of image-based models to video data without modeling spatial-temporal information remains nontrivial and challenging. Considering both intraframe accuracy and interframe consistency of saliency detection, this article presents a novel cross-attention based encoder-decoder model under the Siamese framework (CASNet) for video salient object detection. A baseline encoder-decoder model trained with Lovasz softmax loss function is adopted as a backbone network to guarantee the accuracy of intraframe salient object detection. Self- and cross-attention modules are incorporated into our model in order to preserve the saliency correlation and improve intraframe salient detection consistency. Extensive experimental results obtained by ablation analysis and cross-data set validation demonstrate the effectiveness of our proposed method. Quantitative results indicate that our CASNet model outperforms 19 state-of-the-art image- and video-based methods on six benchmark data sets.

引用

页码：2676 / 2690

页数：15

共 68 条

[1]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[2] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].

Berman, Maxim ;

Triki, Amal Rannen ;

Blaschko, Matthew B. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421

[3]

Borji Ali, 2019, [Computational Visual Media, 计算可视媒体], V5, P117

[4]

Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21

[5] Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion [J].

Chen, Chenglizhao ;

Li, Shuai ;

Wang, Yongguang ;

Qin, Hong ;

Hao, Aimin .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) :3156-3170

[6]

Chen JZ, 2016, PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), P551, DOI [10.1109/CIS.2016.133, 10.1109/CIS.2016.0134]

[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[8] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[9] Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model [J].

Cornia, Marcella ;

Baraldi, Lorenzo ;

Serra, Giuseppe ;

Cucchiara, Rita .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (10) :5142-5154

[10] Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].

Donahue, Jeff ;

Hendricks, Lisa Anne ;

Rohrbach, Marcus ;

Venugopalan, Subhashini ;

Guadarrama, Sergio ;

Saenko, Kate ;

Darrell, Trevor .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691

← 1 2 3 4 5 6 7 →