UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection

被引:15
作者
Guo, Ruohao [1 ]
Ying, Xianghua [1 ]
Qi, Yanyu [2 ]
Qu, Liao [3 ]
机构
[1] Peking Univ, Sch Intelligence Sci & Technol, Natl Key Lab Gen Artificial Intelligence, Beijing 100871, Peoples R China
[2] China Agr Univ, Coll Informat & Elect Engn, Beijing 100091, Peoples R China
[3] Carnegie Mellon Univ, Elect & Comp Engn Dept, Pittsburgh, PA 15213 USA
基金
中国国家自然科学基金;
关键词
Object detection; Feature extraction; Task analysis; Transformers; Image segmentation; Semantics; Computer architecture; Co-object segmentation; multi-modal salient object detection; transformer; deep learning; SEGMENTATION; GRAPH; OPTIMIZATION; REFINEMENT; NETWORK; DEEP;
D O I
10.1109/TMM.2024.3369922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years have witnessed a growing interest in co-object segmentation and multi-modal salient object detection. Many efforts are devoted to segmenting co-existed objects among a group of images or detecting salient objects from different modalities. Albeit the appreciable performance achieved on respective benchmarks, each of these methods is limited to a specific task and cannot be generalized to other tasks. In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dual-stream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing state-of-the-art approaches.
引用
收藏
页码:7622 / 7635
页数:14
相关论文
共 129 条
[91]   RGB-T Image Saliency Detection via Collaborative Graph Learning [J].
Tu, Zhengzheng ;
Xia, Tian ;
Li, Chenglong ;
Wang, Xiaoxiao ;
Ma, Yan ;
Tang, Jin .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) :160-173
[92]   M3S-NIR: Multi-Modal Multi-Scale Noise-Insensitive Ranking for RGB-T Saliency Detection [J].
Tu, Zhengzheng ;
Xia, Tian ;
Li, Chenglong ;
Lu, Yijuan ;
Tang, Jin .
2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, :141-146
[93]  
Vaswani A, 2017, ADV NEUR IN, V30
[94]   Multiple Semantic Matching on Augmented N-Partite Graph for Object Co-Segmentation [J].
Wang, Chuan ;
Zhang, Hua ;
Yang, Liang ;
Cao, Xiaochun ;
Xiong, Hongkai .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (12) :5825-5839
[95]  
Wang Guizhao, 2018, Image and Graphics Technologies and Applications: 13th Conference on Image and Graphics Technologies and Applications, IGTA 2018, Beijing, China, April 8-10, 2018, Revised Selected Papers. Communications in Computer and Information Science (875), P359, DOI 10.1007/978-981-13-1702-6_36
[96]   CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection [J].
Wang, Jie ;
Song, Kechen ;
Bao, Yanqi ;
Huang, Liming ;
Yan, Yunhui .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :2949-2961
[97]   Learning to Detect Salient Objects with Image-level Supervision [J].
Wang, Lijun ;
Lu, Huchuan ;
Wang, Yifan ;
Feng, Mengyang ;
Wang, Dong ;
Yin, Baocai ;
Ruan, Xiang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3796-3805
[98]   Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement [J].
Wang, Wenguan ;
Shen, Jianbing ;
Shao, Ling .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) :4185-4196
[99]  
Wang X., 2020, ARXIV
[100]   End-to-End Video Instance Segmentation with Transformers [J].
Wang, Yuqing ;
Xu, Zhaoliang ;
Wang, Xinlong ;
Shen, Chunhua ;
Cheng, Baoshan ;
Shen, Hao ;
Xia, Huaxia .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8737-8746