Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers

被引：91

作者：

Huang, Zhou ^{[1
,2
]}

Dai, Hang ^{[3
]}

Xiang, Tian-Zhu ^{[4
]}

Wang, Shuo ^{[5
]}

Chen, Huai-Xin ^{[2
]}

Qin, Jie ^{[6
]}

Xiong, Huan ^{[7
]}

机构：

[1] Sichuan Changhong Elect Co Ltd, Mianyang, Sichuan, Peoples R China

[2] UESTC, Chengdu, Peoples R China

[3] Univ Glasgow, Glasgow, Lanark, Scotland

[4] G42, Shanghai, Peoples R China

[5] Swiss Fed Inst Technol, Zurich, Switzerland

[6] NUAA, CCST, Nanjing, Peoples R China

[7] MBZUAI, Abu Dhabi, U Arab Emirates

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00538

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection. However, they suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders, which are not conducive to camouflaged object detection that explores subtle cues from indistinguishable backgrounds. To address these issues, in this paper, we propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features through progressive shrinking for camouflaged object detection. Specifically, we propose a nonlocal token enhancement module (NL-TEM) that employs the non-local mechanism to interact neighboring tokens and explore graph-based high-order relations within tokens to enhance local representations of transformers. Moreover, we design a feature shrinkage decoder (FSD) with adjacent interaction modules (AIM), which progressively aggregates adjacent transformer features through a layer-by-layer shrinkage pyramid to accumulate imperceptible but effective cues as much as possible for object information decoding. Extensive quantitative and qualitative experiments demonstrate that the proposed model significantly outperforms the existing 24 competitors on three challenging COD benchmark datasets under six widely-used evaluation metrics. Our code is publicly available at https: //github.com/ZhouHuang23/FSPNet.

引用

页码：5557 / 5566

页数：10

共 60 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01280

[3]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00866

[4]

Ba J. L., 2016, Layer Normalization

[5]

Bhajantri NU, 2006, ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, P145

[6]

Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813

[7] Dual-Aligned Oriented Detector [J].

Cheng, Gong ;

Yao, Yanqing ;

Li, Shengyang ;

Li, Ke ;

Xie, Xingxing ;

Wang, Jiabao ;

Yao, Xiwen ;

Han, Junwei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[8] Implicit Motion Handling for Video Camouflaged Object Detection [J].

Cheng, Xuelian ;

Xiong, Huan ;

Fan, Deng-Ping ;

Zhong, Yiran ;

Harandi, Mehrtash ;

Drummond, Tom ;

Ge, Zongyuan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13854-13863

[9]

COULON J, 2022, WACV, P1445, DOI DOI 10.1109/ISCAS48785.2022.9937958

[10]

Deng-Ping Fan, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P263, DOI 10.1007/978-3-030-59725-2_26

← 1 2 3 4 5 6 →