Generative Transformer for Accurate and Reliable Salient Object Detection

被引：0

作者：

Mao, Yuxin ^{[1
,2
]}

Zhang, Jing ^{[3
]}

Wan, Zhexiong ^{[1
,2
]}

Tian, Xinyu ^{[1
,2
]}

Li, Aixuan ^{[1
,2
]}

Lv, Yunqiu ^{[1
,2
]}

Dai, Yuchao ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

[2] Shaanxi Key Lab Informat Acquisit & Proc, Xian 710072, Peoples R China

[3] Australian Natl Univ, Sch Comp, Canberra, ACT 2601, Australia

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Transformers; Context modeling; Predictive models; Object detection; Accuracy; Reliability; Generative adversarial networks; Feature extraction; Decoding; Visualization; Vision transformer; salient object detection; inferential generative adversarial network; ATTENTION; NETWORK;

D O I：

10.1109/TCSVT.2024.3469286

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We explore the impact of transformers on accurate and reliable salient object detection. For accuracy, we integrate the transformer with a deterministic model and delineate its advantages in structural modeling. Regarding reliability, we address the transformer's tendency to produce overly confident, incorrect predictions. To gauge reliability implicitly, we introduce a latent variable model within the transformer framework, termed the inferential generative adversarial network (iGAN). The stochastic nature of the latent variable facilitates the estimation of predictive uncertainty, which serves as an auxiliary measure of the model's prediction reliability. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution N(0, I). The proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to fully supervised salient object detection, explaining that iGAN within the transformer framework leads to both accurate and reliable salient object detection.

引用

页码：1041 / 1054

页数：14

共 50 条

[21] FASA: Fast, Accurate, and Size-Aware Salient Object Detection
Yildirim, Goekhan
Suesstrunk, Sabine
COMPUTER VISION - ACCV 2014, PT III, 2015, 9005 : 514 - 528
[22] Accurate and efficient salient object detection via position prior attention
Zhang, Jin
Liang, Qiuwei
Shi, Yanjiao
IMAGE AND VISION COMPUTING, 2022, 124
[23] Salient object detection for RGB-D images by generative adversarial network
Liu, Zhengyi
Tang, Jiting
Xiang, Qian
Zhao, Peng
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 25403 - 25425
[24] SALIENT OBJECT DETECTION WITH CAPSULE-BASED CONDITIONAL GENERATIVE ADVERSARIAL NETWORK
Zhang, Chao
Yang, Fei
Qiu, Guoping
Zhang, Qian
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 81 - 85
[25] Salient object detection for RGB-D images by generative adversarial network
Zhengyi Liu
Jiting Tang
Qian Xiang
Peng Zhao
Multimedia Tools and Applications, 2020, 79 : 25403 - 25425
[26] A Simple Yet Effective Network Based on Vision Transformer for Camouflaged Object and Salient Object Detection
Hao, Chao
Yu, Zitong
Liu, Xin
Xu, Jun
Yue, Huanjing
Yang, Jingyu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 608 - 622
[27] Transformer-based Cross Reference Network for video salient object detection
Huang, Kan
Tian, Chunwei
Su, Jingyong
Lin, Jerry Chun-Wei
PATTERN RECOGNITION LETTERS, 2022, 160 : 122 - 127
[28] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Song, Peipei
Zhang, Jing
Koniusz, Piotr
Barnes, Nick
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
[29] GroupTransNet: Group transformer network for RGB-D salient object detection
Fang, Xian
Jiang, Mingfeng
Zhu, Jinchao
Shao, Xiuli
Wang, Hongpeng
NEUROCOMPUTING, 2024, 594
[30] Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection
Liu, Nian
Nan, Kepan
Zhao, Wangbo
Yao, Xiwen
Han, Junwei
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10663 - 10673

← 1 2 3 4 5 →