TANet: Transformer-based asymmetric network for RGB-D salient object detection

被引:6
作者
Liu, Chang [1 ]
Yang, Gang [1 ,3 ]
Wang, Shuo [1 ]
Wang, Hangxu [1 ,2 ]
Zhang, Yunhua [1 ]
Wang, Yutao [1 ]
机构
[1] Northeastern Univ, Shenyang, Liaoning, Peoples R China
[2] DUT Artificial Intelligence Inst, Dalian, Peoples R China
[3] Northeastern Univ, Wenhua Rd, Shenyang 110000, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
computer vision; image segmentation; object detection; REGION;
D O I
10.1049/cvi2.12177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing RGB-D salient object detection methods mainly rely on a symmetric two-stream Convolutional Neural Network (CNN)-based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric conventional network structure: first, the ability of CNN in learning global contexts is limited; second, the symmetric two-stream structure ignores the inherent differences between modalities. In this study, a Transformer-based asymmetric network is proposed to tackle the issues mentioned above. The authors employ the powerful feature extraction capability of Transformer to extract global semantic information from RGB data and design a lightweight CNN backbone to extract spatial structure information from depth data without pre-training. The asymmetric hybrid encoder effectively reduces the number of parameters in the model while increasing speed without sacrificing performance. Then, a cross-modal feature fusion module which enhances and fuses RGB and depth features with each other is designed. Finally, the authors add edge prediction as an auxiliary task and propose an edge enhancement module to generate sharper contours. Extensive experiments demonstrate that our method achieves superior performance over 14 state-of-the-art RGB-D methods on six public datasets. The code of the authors will be released at .
引用
收藏
页码:415 / 430
页数:16
相关论文
共 71 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Salient Object Detection: A Benchmark
    Borji, Ali
    Cheng, Ming-Ming
    Jiang, Huaizu
    Li, Jia
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
  • [3] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [4] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, Youfu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) : 2825 - 2835
  • [5] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, Youfu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3051 - 3060
  • [6] Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063
  • [7] Intelligent Visual Media Processing: When Graphics Meets Vision
    Cheng, Ming-Ming
    Hou, Qi-Bin
    Zhang, Song-Hai
    Rosin, Paul L.
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (01) : 110 - 121
  • [8] Chongyi Li, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P225, DOI 10.1007/978-3-030-58598-3_14
  • [9] Review of Visual Saliency Detection With Comprehensive Information
    Cong, Runmin
    Lei, Jianjun
    Fu, Huazhu
    Cheng, Ming-Ming
    Lin, Weisi
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 2941 - 2959
  • [10] Saliency Driven Total Variation Segmentation
    Donoser, Michael
    Urschler, Martin
    Hirzer, Martin
    Bischof, Horst
    [J]. 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 817 - 824