Generative Transformer for Accurate and Reliable Salient Object Detection

被引：0

作者：

Mao, Yuxin ^{[1
,2
]}

Zhang, Jing ^{[3
]}

Wan, Zhexiong ^{[1
,2
]}

Tian, Xinyu ^{[1
,2
]}

Li, Aixuan ^{[1
,2
]}

Lv, Yunqiu ^{[1
,2
]}

Dai, Yuchao ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

[2] Shaanxi Key Lab Informat Acquisit & Proc, Xian 710072, Peoples R China

[3] Australian Natl Univ, Sch Comp, Canberra, ACT 2601, Australia

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Transformers; Context modeling; Predictive models; Object detection; Accuracy; Reliability; Generative adversarial networks; Feature extraction; Decoding; Visualization; Vision transformer; salient object detection; inferential generative adversarial network; ATTENTION; NETWORK;

D O I：

10.1109/TCSVT.2024.3469286

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We explore the impact of transformers on accurate and reliable salient object detection. For accuracy, we integrate the transformer with a deterministic model and delineate its advantages in structural modeling. Regarding reliability, we address the transformer's tendency to produce overly confident, incorrect predictions. To gauge reliability implicitly, we introduce a latent variable model within the transformer framework, termed the inferential generative adversarial network (iGAN). The stochastic nature of the latent variable facilitates the estimation of predictive uncertainty, which serves as an auxiliary measure of the model's prediction reliability. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution N(0, I). The proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to fully supervised salient object detection, explaining that iGAN within the transformer framework leads to both accurate and reliable salient object detection.

引用

页码：1041 / 1054

页数：14

共 50 条

[31] Mirror complementary transformer network for RGB-thermal salient object detection
Jiang, Xiurong
Hou, Yifan
Tian, Hui
Zhu, Lin
IET COMPUTER VISION, 2024, 18 (01) : 15 - 32
[32] Salient object detection based on Pyramid Vision Transformer-gated network
Zhou, Xiaoli
Huo, Lina
Wang, Wei
Hao, Peng
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
[33] NSAW: An Efficient and Accurate Transformer for Vehicle LiDAR Object Detection
Hu, Yujie
Li, Shaoxian
Weng, Wenchao
Xu, Kuiwen
Wang, Gaofeng
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[34] LARNet: Towards Lightweight, Accurate and Real-Time Salient Object Detection
Wang, Zhenyu
Zhang, Yunzhou
Liu, Yan
Qin, Cao
Coleman, Sonya A.
Kerr, Dermot
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5207 - 5222
[35] Delving into Calibrated Depth for Accurate RGB-D Salient Object Detection
Li, Jingjing
Ji, Wei
Zhang, Miao
Piao, Yongri
Lu, Huchuan
Cheng, Li
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (04) : 855 - 876
[36] Recursive Contour-Saliency Blending Network for Accurate Salient Object Detection
Ke, Yun Yi
Tsubono, Takahiro
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1360 - 1370
[37] Delving into Calibrated Depth for Accurate RGB-D Salient Object Detection
Jingjing Li
Wei Ji
Miao Zhang
Yongri Piao
Huchuan Lu
Li Cheng
International Journal of Computer Vision, 2023, 131 : 855 - 876
[38] What is a Salient Object? A Dataset and a Baseline Model for Salient Object Detection
Borji, Ali
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (02) : 742 - 756
[39] Collaborative spatial-temporal video salient object detection with cross attention transformer
Su, Yuting
Wang, Weikang
Liu, Jing
Jing, Peiguang
SIGNAL PROCESSING, 2024, 224
[40] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
Liu, Zhengyi
Wang, Yuan
Tu, Zhengzheng
Xiao, Yun
Tang, Bin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490

← 1 2 3 4 5 →