UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection

被引：15

作者：

Guo, Ruohao ^{[1
]}

Ying, Xianghua ^{[1
]}

Qi, Yanyu ^{[2
]}

Qu, Liao ^{[3
]}

机构：

[1] Peking Univ, Sch Intelligence Sci & Technol, Natl Key Lab Gen Artificial Intelligence, Beijing 100871, Peoples R China

[2] China Agr Univ, Coll Informat & Elect Engn, Beijing 100091, Peoples R China

[3] Carnegie Mellon Univ, Elect & Comp Engn Dept, Pittsburgh, PA 15213 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Feature extraction; Task analysis; Transformers; Image segmentation; Semantics; Computer architecture; Co-object segmentation; multi-modal salient object detection; transformer; deep learning; SEGMENTATION; GRAPH; OPTIMIZATION; REFINEMENT; NETWORK; DEEP;

D O I：

10.1109/TMM.2024.3369922

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent years have witnessed a growing interest in co-object segmentation and multi-modal salient object detection. Many efforts are devoted to segmenting co-existed objects among a group of images or detecting salient objects from different modalities. Albeit the appreciable performance achieved on respective benchmarks, each of these methods is limited to a specific task and cannot be generalized to other tasks. In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dual-stream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing state-of-the-art approaches.

引用

页码：7622 / 7635

页数：14

共 129 条

[91] RGB-T Image Saliency Detection via Collaborative Graph Learning [J].

Tu, Zhengzheng ;

Xia, Tian ;

Li, Chenglong ;

Wang, Xiaoxiao ;

Ma, Yan ;

Tang, Jin .

IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) :160-173

[92] M3S-NIR: Multi-Modal Multi-Scale Noise-Insensitive Ranking for RGB-T Saliency Detection [J].

Tu, Zhengzheng ;

Xia, Tian ;

Li, Chenglong ;

Lu, Yijuan ;

Tang, Jin .

2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, :141-146

[93]

Vaswani A, 2017, ADV NEUR IN, V30

[94] Multiple Semantic Matching on Augmented N-Partite Graph for Object Co-Segmentation [J].

Wang, Chuan ;

Zhang, Hua ;

Yang, Liang ;

Cao, Xiaochun ;

Xiong, Hongkai .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (12) :5825-5839

[95]

Wang Guizhao, 2018, Image and Graphics Technologies and Applications: 13th Conference on Image and Graphics Technologies and Applications, IGTA 2018, Beijing, China, April 8-10, 2018, Revised Selected Papers. Communications in Computer and Information Science (875), P359, DOI 10.1007/978-981-13-1702-6_36

[96] CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection [J].

Wang, Jie ;

Song, Kechen ;

Bao, Yanqi ;

Huang, Liming ;

Yan, Yunhui .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :2949-2961

[97] Learning to Detect Salient Objects with Image-level Supervision [J].

Wang, Lijun ;

Lu, Huchuan ;

Wang, Yifan ;

Feng, Mengyang ;

Wang, Dong ;

Yin, Baocai ;

Ruan, Xiang .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3796-3805

[98] Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement [J].

Wang, Wenguan ;

Shen, Jianbing ;

Shao, Ling .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) :4185-4196

[99]

Wang X., 2020, ARXIV

[100] End-to-End Video Instance Segmentation with Transformers [J].

Wang, Yuqing ;

Xu, Zhaoliang ;

Wang, Xinlong ;

Shen, Chunhua ;

Cheng, Baoshan ;

Shen, Hao ;

Xia, Huaxia .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8737-8746

← 4 5 6 7 8 9 10 11 12 13 →