TANet: Transformer-based asymmetric network for RGB-D salient object detection

被引：6

作者：

Liu, Chang ^{[1
]}

Yang, Gang ^{[1
,3
]}

Wang, Shuo ^{[1
]}

Wang, Hangxu ^{[1
,2
]}

Zhang, Yunhua ^{[1
]}

Wang, Yutao ^{[1
]}

机构：

[1] Northeastern Univ, Shenyang, Liaoning, Peoples R China

[2] DUT Artificial Intelligence Inst, Dalian, Peoples R China

[3] Northeastern Univ, Wenhua Rd, Shenyang 110000, Liaoning, Peoples R China

来源：

IET COMPUTER VISION | 2023年 / 17卷 / 04期

基金：

中国国家自然科学基金;

关键词：

computer vision; image segmentation; object detection; REGION;

D O I：

10.1049/cvi2.12177

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing RGB-D salient object detection methods mainly rely on a symmetric two-stream Convolutional Neural Network (CNN)-based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric conventional network structure: first, the ability of CNN in learning global contexts is limited; second, the symmetric two-stream structure ignores the inherent differences between modalities. In this study, a Transformer-based asymmetric network is proposed to tackle the issues mentioned above. The authors employ the powerful feature extraction capability of Transformer to extract global semantic information from RGB data and design a lightweight CNN backbone to extract spatial structure information from depth data without pre-training. The asymmetric hybrid encoder effectively reduces the number of parameters in the model while increasing speed without sacrificing performance. Then, a cross-modal feature fusion module which enhances and fuses RGB and depth features with each other is designed. Finally, the authors add edge prediction as an auxiliary task and propose an edge enhancement module to generate sharper contours. Extensive experiments demonstrate that our method achieves superior performance over 14 state-of-the-art RGB-D methods on six public datasets. The code of the authors will be released at .

引用

页码：415 / 430

页数：16

共 71 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] Salient Object Detection: A Benchmark
Borji, Ali
Cheng, Ming-Ming
Jiang, Huaizu
Li, Jia
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
[3] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[4] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
Chen, Hao
Li, Youfu
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2825 - 2835
[5] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection
Chen, Hao
Li, Youfu
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3051 - 3060
[6] Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063
[7] Intelligent Visual Media Processing: When Graphics Meets Vision
Cheng, Ming-Ming
Hou, Qi-Bin
Zhang, Song-Hai
Rosin, Paul L.
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (01) : 110 - 121
[8] Chongyi Li, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P225, DOI 10.1007/978-3-030-58598-3_14
[9] Review of Visual Saliency Detection With Comprehensive Information
Cong, Runmin
Lei, Jianjun
Fu, Huazhu
Cheng, Ming-Ming
Lin, Weisi
Huang, Qingming
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 2941 - 2959
[10] Saliency Driven Total Variation Segmentation
Donoser, Michael
Urschler, Martin
Hirzer, Martin
Bischof, Horst
[J]. 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 817 - 824

← 1 2 3 4 5 6 7 8 →