Salient Object Detection via Dynamic Scale Routing

被引:29
作者
Wu, Zhenyu [1 ]
Li, Shuai [1 ,2 ]
Chen, Chenglizhao [1 ]
Qin, Hong [3 ]
Hao, Aimin [1 ,2 ,4 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[3] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
[4] Chinese Acad Med Sci, Res Unit Virtual Human & Virtual Surg 2019RU004, Beijing 100006, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Kernel; Decoding; Feature extraction; Routing; Deep learning; Object detection; Technological innovation; Dynamic scale routing; scale-aware feature aggregation; salient object detection; NETWORK; FUSION;
D O I
10.1109/TIP.2022.3214332
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research advances in salient object detection (SOD) could largely be attributed to ever-stronger multi-scale feature representation empowered by the deep learning technologies. The existing SOD deep models extract multi-scale features via the off-the-shelf encoders and combine them smartly via various delicate decoders. However, the kernel sizes in this commonly-used thread are usually "fixed". In our new experiments, we have observed that kernels of small size are preferable in scenarios containing tiny salient objects. In contrast, large kernel sizes could perform better for images with large salient objects. Inspired by this observation, we advocate the "dynamic" scale routing (as a brand-new idea) in this paper. It will result in a generic plug-in that could directly fit the existing feature backbone. This paper's key technical innovations are two-fold. First, instead of using the vanilla convolution with fixed kernel sizes for the encoder design, we propose the dynamic pyramid convolution (DPConv), which dynamically selects the best-suited kernel sizes w.r.t. the given input. Second, we provide a self-adaptive bidirectional decoder design to accommodate the DPConv-based encoder best. The most significant highlight is its capability of routing between feature scales and their dynamic collection, making the inference process scale-aware. As a result, this paper continues to enhance the current SOTA performance. Both the code and dataset are publicly available at https://github.com/wuzhenyubuaa/DPNet.
引用
收藏
页码:6649 / 6663
页数:15
相关论文
共 87 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Bolukbasi T., 2017, PR MACH LEARN RES, P527
  • [3] Salient Object Detection: A Benchmark
    Borji, Ali
    Cheng, Ming-Ming
    Jiang, Huaizu
    Li, Jia
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
  • [4] Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion
    Chen, Chenglizhao
    Li, Shuai
    Wang, Yongguang
    Qin, Hong
    Hao, Aimin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) : 3156 - 3170
  • [5] Robust salient motion detection in non-stationary videos via novel integrated strategies of spatio-temporal coherency clues and low-rank analysis
    Chen, Chenglizhao
    Li, Shuai
    Qin, Hong
    Hao, Aimin
    [J]. PATTERN RECOGNITION, 2016, 52 : 410 - 432
  • [6] Structure-Sensitive Saliency Detection via Multilevel Rank Analysis in Intrinsic Feature Space
    Chen, Chenglizhao
    Li, Shuai
    Qin, Hong
    Hao, Aimin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (08) : 2303 - 2316
  • [7] Reverse Attention for Salient Object Detection
    Chen, Shuhan
    Tan, Xiuli
    Wang, Ben
    Hu, Xuelong
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 236 - 252
  • [8] Reverse Attention-Based Residual Network for Salient Object Detection
    Chen, Shuhan
    Tan, Xiuli
    Wang, Ben
    Lu, Huchuan
    Hu, Xuelong
    Fu, Yun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3763 - 3776
  • [9] Dynamic Convolution: Attention over Convolution Kernels
    Chen, Yinpeng
    Dai, Xiyang
    Liu, Mengchen
    Chen, Dongdong
    Yuan, Lu
    Liu, Zicheng
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11027 - 11036
  • [10] Chen ZY, 2020, AAAI CONF ARTIF INTE, V34, P10599