ABSNet: Aesthetics-Based Saliency Network Using Multi-Task Convolutional Network

被引:6
作者
Liu, Jing [1 ]
Lv, Jincheng [1 ]
Yuan, Min [1 ]
Zhang, Jing [1 ]
Su, Yuting [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Task analysis; Feature extraction; Saliency detection; Signal processing algorithms; Prediction algorithms; Semantics; Aesthetics assessment; multi-task learning; visual saliency detection; ENCODER-DECODER NETWORK; VISUAL-ATTENTION; OBJECT DETECTION; MODEL;
D O I
10.1109/LSP.2020.3035065
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As a smart visual attention mechanism to analyze visual scenes, visual saliency has been shown to closely correlate with semantic information such as faces. Although many semantic-information-guided saliency models have been proposed, to the best of our knowledge, no semantic information in affective domain has been employed for saliency detection. Aesthetic, the affective perceptual quality that integrates factors like scene composition and contrast, can certainly benefit visual attention that highly depends on these visual factors. In this letter, we propose an end-to-end multi-task framework called aesthetics-based saliency network (ABSNet). We use three commonly-used shared backbones and design two distinct branches for each task. Mean square error (MSE) loss and Earth Mover's Distance (EMD) loss are jointly adopted to alternately train the shared network and individual branch for different tasks, facilitating the proposed model to extract more effective features for visual perception. Moreover, our model is resolution-friendly to predict saliency for images of arbitrary size. It has been shown that the proposed multi-task method is superior over single-task version and outperforms state-of-the-art saliency methods.
引用
收藏
页码:2014 / 2018
页数:5
相关论文
共 39 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2017, CORR
[3]  
[Anonymous], 2006, Advances in neural information processing systems
[4]  
[Anonymous], 2019, ARXIV190506803
[5]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[6]  
He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]
[7]  
Hou L., 2017, P ADV NEUR INF PROC
[8]   Image Signature: Highlighting Sparse Salient Regions [J].
Hou, Xiaodi ;
Harel, Jonathan ;
Koch, Christof .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) :194-201
[9]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[10]   Saliency Detection by Adaptive Channel Fusion [J].
Huang, Kan ;
Zhu, Chunbiao ;
Li, Ge .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (07) :1059-1063