Learning channel -wise spatio-temporal representations for video salient object detection

被引：8

作者：

Huang, Kan ^{[1
,2
]}

Li, Ge ^{[1
,2
]}

Liu, Shan ^{[3
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, Sch Elect & Comp Engn, Shenzhen 518055, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China

[3] Tencent Inc, Shenzhen 518000, Peoples R China

来源：

NEUROCOMPUTING | 2020年 / 403卷

基金：

中国国家自然科学基金;

关键词：

OPTIMIZATION;

D O I：

10.1016/j.neucom.2020.04.015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video salient object detection aims at extracting most attention-grabbing objects in videos, which tends to greatly enhance many vision based tasks such as video understanding. In this work we explore this research issue from a novel perspective, i.e., learning the spatio-temporal representations associated with salient regions in separated feature channels. We propose a Channel-wise Spatio-Temporal Representation learning block (CSTR), which is trained to discriminate between salient spatio-temporal patterns and non-salient spatio-temporal patterns in separated channels. A whole CNN architecture based on this block is constructed for video salient object detection. This architecture combines dynamic saliency learned from CSTR and static saliency learned from a constructed Multi-scale Dilated Convolution block (MDC), deriving the final saliency detection results. This intuitive combination improves feature representation capability which contributes to more precise detection results. Compared with previous works that leverage optical flow or RNNs (LSTM, GRU etc.) to utilize temporal cues, the proposed method is simple to implement and offers an intuitive way to understand how spatio-temporal patterns are correlated with salient regions. Extensive experimental evaluations verify the effectiveness of the insight of the proposed method and confirm that our proposed model outperforms other outstanding methods on four popular benchmarks. © 2020 Elsevier B.V.

引用

页码：325 / 336

页数：12

共 52 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

[Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.319

[3]

[Anonymous], 2017, ARXIV170802001

[4] Video SnapCut: Robust Video Object Cutout Using Localized Classifiers [J].

Bai, Xue ;

Wang, Jue ;

Simons, David ;

Sapiro, Guillermo .

ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03)

[5] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Jiang, Huaizu ;

Li, Jia .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722

[6] Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion [J].

Chen, Chenglizhao ;

Li, Shuai ;

Wang, Yongguang ;

Qin, Hong ;

Hao, Aimin .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) :3156-3170

[7] Reverse Attention for Salient Object Detection [J].

Chen, Shuhan ;

Tan, Xiuli ;

Wang, Ben ;

Hu, Xuelong .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252

[8] SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection [J].

Chen, Yuhuan ;

Zou, Wenbin ;

Tang, Yi ;

Li, Xia ;

Xu, Chen ;

Komodakis, Nikos .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) :3345-3357

[9] Global Contrast based Salient Region Detection [J].

Cheng, Ming-Ming ;

Zhang, Guo-Xin ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Hu, Shi-Min .

2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416

[10]

Cui X., 2012, P 12 EUR C COMP VIS

← 1 2 3 4 5 6 →