CTIF-Net: A CNN-Transformer Iterative Fusion Network for Salient Object Detection

被引：15

作者：

Yuan, Junbin ^{[1
]}

Zhu, Aiqing ^{[1
]}

Xu, Qingzhen ^{[1
]}

Wattanachote, Kanoksak ^{[2
]}

Gong, Yongyi ^{[2
]}

机构：

[1] South China Normal Univ, Sch Comp Sci, Guangzhou 510631, Peoples R China

[2] Guangdong Univ Foreign Studies, Sch Informat Sci & Technol, Intelligent Hlth & Visual Comp Lab, Guangzhou 510006, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 05期

关键词：

CNN; transformer; iterative fusion; salient object detection; ATTENTION; MODEL;

D O I：

10.1109/TCSVT.2023.3321190

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Capturing sufficient global context and rich spatial structure information is critical for dense prediction tasks. Convolutional Neural Network (CNN) is particularly adept at modeling fine-grained local features, while Transformer excels at modeling global context information. It is evident that CNN and Transformer exhibit complementary characteristics. Exploring the design of a network, that efficiently fuses these two models to leverage their strengths fully and achieve more accurate detection, represents a promising and worthwhile research topic. In this paper, we introduce a novel CNN-Transformer Iterative Fusion Network (CTIF-Net) for salient object detection. It efficiently combines CNN and Transformer to achieve superior performance by using a parallel dual encoder structure and a feature iterative fusion module. Firstly, CTIF-Net extracts features from the image using the CNN and the Transformer, respectively. Secondly, two feature convertors and a feature iterative fusion module are employed to combine and iteratively refine the two sets of features. The experimental results on multiple SOD datasets show that CTIF-Net outperforms 17 state-of-the-art methods, achieving higher performance in various mainstream evaluation metrics such as F-measure, S-measure, and MAE value. Code can be found at https://github.com/danielfaster/CTIF-Net/.

引用

页码：3795 / 3805

页数：11

共 50 条

[31] Feature extraction and fusion network for salient object detection
Chao Dai
Chen Pan
Wei He
Multimedia Tools and Applications, 2022, 81 : 33955 - 33969
[32] Selective feature fusion network for salient object detection
Sun, Fengming
Yuan, Xia
Zhao, Chunxia
IET COMPUTER VISION, 2023, 17 (04) : 483 - 495
[33] Transformers and CNNs fusion network for salient object detection
Yao, Cuili
Feng, Lin
Kong, Yuqiu
Xiao, Lin
Chen, Tao
NEUROCOMPUTING, 2023, 520 : 342 - 355
[34] CNN-TransNet: A Hybrid CNN-Transformer Network With Differential Feature Enhancement for Cloud Detection
Ma, Nan
Sun, Lin
He, Yawen
Zhou, Chenghu
Dong, Chuanxiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[35] Hybrid CNN-transformer network for efficient CSI feedback
Zhao, Ruohan
Liu, Ziang
Song, Tianyu
Jin, Jiyu
Jin, Guiyue
Fan, Lei
PHYSICAL COMMUNICATION, 2024, 66
[36] Image harmonization with Simple Hybrid CNN-Transformer Network
Li, Guanlin
Zhao, Bin
Li, Xuelong
NEURAL NETWORKS, 2024, 180
[37] Hybrid CNN-Transformer Feature Fusion for Single Image Deraining
Chen, Xiang
Pan, Jinshan
Lu, Jiyang
Fan, Zhentao
Li, Hao
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 378 - 386
[38] CNN-Transformer Hybrid Architecture for Early Fire Detection
Yang, Chenyue
Pan, Yixuan
Cao, Yichao
Lu, Xiaobo
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 570 - 581
[39] RGB-D Salient Object Detection by a CNN With Multiple Layers Fusion
Huang, Rui
Xing, Yan
Wang, ZeZheng
IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (04) : 552 - 556
[40] Salient Object Detection based on CNN Fusion of Two Types of Saliency Models
Hassan, Muhammad Umair
Niu, Dongmei
Zhao, Xiuyang
Shohag, Md Shakil Ahamed
Ma, Yingjun
Zhang, Mingxuan
2019 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2019,

← 1 2 3 4 5 →