Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection

被引：31

作者：

Li, Aixuan ^{[1
,2
]}

Mao, Yuxin ^{[1
,2
]}

Zhang, Jing ^{[3
]}

Dai, Yuchao ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710129, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China

[3] Australian Natl Univ, Sch Comp, Canberra, ACT 2601, Australia

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Saliency detection; Mutual information; Predictive models; Object detection; Data mining; Training; Weakly-supervised; salient object detection; mutual information regularization; NETWORK; FUSION;

D O I：

10.1109/TCSVT.2023.3285249

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection. Based on our multimodal representation learning framework, we introduce an asymmetric feature extractor for our multimodal data, which is proven more effective than the conventional symmetric backbone setting. We also introduce multimodal variational auto-encoder as stochastic prediction refinement techniques, which takes pseudo labels from the first training stage as supervision and generates refined prediction. Experimental results on benchmark RGB-D salient object detection datasets verify both effectiveness of our explicit multimodal disentangled representation learning method and the stochastic prediction refinement strategy, achieving comparable performance with the state-of-the-art fully supervised models. Our code and data are available at: https://npucvr.github.io/MIRV/.

引用

页码：397 / 410

页数：14

共 131 条

[51] Pseudo-mask Matters in Weakly-supervised Semantic Segmentation [J].

Li, Yi ;

Kuang, Zhanghui ;

Liu, Liyang ;

Chen, Yimin ;

Zhang, Wayne .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6944-6953

[52] Co-saliency Detection with Graph Matching [J].

Li, Zun ;

Lang, Congyan ;

Feng, Jiashi ;

Li, Yidong ;

Wang, Tao ;

Feng, Songhe .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2019, 10 (03)

[53] Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation [J].

Liang, Zhiyuan ;

Wang, Tiancai ;

Zhang, Xiangyu ;

Sun, Jian ;

Shen, Jianbing .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :16886-16895

[54] Visual Saliency Transformer [J].

Liu, Nian ;

Zhang, Ni ;

Wan, Kaiyuan ;

Shao, Ling ;

Han, Junwei .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :4702-4712

[55] Salient Object Detection via Two-Stage Graphs [J].

Liu, Yi ;

Han, Jungong ;

Zhang, Qiang ;

Wang, Long .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (04) :1023-1037

[56] Weakly-Supervised Salient Object Detection With Saliency Bounding Boxes [J].

Liu, Yuxuan ;

Wang, Pengjie ;

Cao, Ying ;

Liang, Zijian ;

Lau, Rynson W. H. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4423-4435

[57] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].

Liu, Ze ;

Lin, Yutong ;

Cao, Yue ;

Hu, Han ;

Wei, Yixuan ;

Zhang, Zheng ;

Lin, Stephen ;

Guo, Baining .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002

[58] SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection [J].

Liu, Zhengyi ;

Tan, Yacheng ;

He, Qian ;

Xiao, Yun .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) :4486-4497

[59] Learning optimal seeds for diffusion-based salient object detection [J].

Lu, Song ;

Mahadevan, Vijay ;

Vasconcelos, Nuno .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2790-2797

[60]

Ma XZ, 2021, Arxiv, DOI arXiv:2004.11820

← 1 2 3 4 5 6 7 8 9 10 →