Generative Transformer for Accurate and Reliable Salient Object Detection

被引：0

作者：

Mao, Yuxin ^{[1
,2
]}

Zhang, Jing ^{[3
]}

Wan, Zhexiong ^{[1
,2
]}

Tian, Xinyu ^{[1
,2
]}

Li, Aixuan ^{[1
,2
]}

Lv, Yunqiu ^{[1
,2
]}

Dai, Yuchao ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

[2] Shaanxi Key Lab Informat Acquisit & Proc, Xian 710072, Peoples R China

[3] Australian Natl Univ, Sch Comp, Canberra, ACT 2601, Australia

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Transformers; Context modeling; Predictive models; Object detection; Accuracy; Reliability; Generative adversarial networks; Feature extraction; Decoding; Visualization; Vision transformer; salient object detection; inferential generative adversarial network; ATTENTION; NETWORK;

D O I：

10.1109/TCSVT.2024.3469286

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We explore the impact of transformers on accurate and reliable salient object detection. For accuracy, we integrate the transformer with a deterministic model and delineate its advantages in structural modeling. Regarding reliability, we address the transformer's tendency to produce overly confident, incorrect predictions. To gauge reliability implicitly, we introduce a latent variable model within the transformer framework, termed the inferential generative adversarial network (iGAN). The stochastic nature of the latent variable facilitates the estimation of predictive uncertainty, which serves as an auxiliary measure of the model's prediction reliability. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution N(0, I). The proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to fully supervised salient object detection, explaining that iGAN within the transformer framework leads to both accurate and reliable salient object detection.

引用

页码：1041 / 1054

页数：14

共 50 条

[41] Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection
Deng, Xinhao
Zhang, Pingping
Liu, Wei
Lu, Huchuan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7413 - 7423
[42] Lightweight cross-modal transformer for RGB-D salient object detection
Huang, Nianchang
Yang, Yang
Zhang, Qiang
Han, Jungong
Huang, Jin
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
[43] Adaptive Spatial Tokenization Transformer for Salient Object Detection in Optical Remote Sensing Images
Gao, Lina
Liu, Bing
Fu, Ping
Xu, Mingzhu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[44] Transformer-Based Light Field Salient Object Detection and Its Application to Autofocus
Jiang, Yao
Li, Xin
Fu, Keren
Zhao, Qijun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6647 - 6659
[45] CATNet: A Cascaded and Aggregated Transformer Network for RGB-D Salient Object Detection
Sun, Fuming
Ren, Peng
Yin, Bowen
Wang, Fasheng
Li, Haojie
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2249 - 2262
[46] Bidirectional mutual guidance transformer for salient object detection in optical remote sensing images
Huang, Kan
Tian, Chunwei
Li, Ge
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (13) : 4016 - 4033
[47] Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection
Min, Dingyao
Zhang, Chao
Lu, Yukang
Fu, Keren
Zhao, Qijun
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1674 - 1678
[48] Salient Object Detection by Composition
Feng, Jie
Wei, Yichen
Tao, Litian
Zhang, Chao
Sun, Jian
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1028 - 1035
[49] Spectral salient object detection
Fu, Keren
Gu, Irene Yu-Hua
Yang, Jie
NEUROCOMPUTING, 2018, 275 : 788 - 803
[50] Salient object detection: A survey
Borji, Ali
Cheng, Ming-Ming
Hou, Qibin
Jiang, Huaizu
Li, Jia
COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) : 117 - 150

← 1 2 3 4 5 →