Category-aware Allocation Transformer for Weakly Supervised Object Localization

被引：8

作者：

Chen, Zhiwei ^{[1
]}

Ding, Jinren ^{[1
]}

Cao, Liujuan ^{[1
]}

Shen, Yunhang ^{[2
]}

Zhang, Shengchuan ^{[1
]}

Jiang, Guannan ^{[3
]}

Ji, Rongrong ^{[1
]}

机构：

[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Co, Minist Educ China, Xiamen, Peoples R China

[2] Tencent Youtu Lab, Shenzhen, Peoples R China

[3] CATL, Ningde, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/ICCV51070.2023.00611

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised object localization (WSOL) aims to localize objects based on only image-level labels as supervision. Recently, transformers have been introduced into WSOL, yielding impressive results. The self-attention mechanism and multilayer perceptron structure in transformers preserve long-range feature dependency, facilitating complete localization of the full object extent. However, current transformer-based methods predict bounding boxes using category-agnostic attention maps, which may lead to confused and noisy object localization. To address this issue, we propose a novel Category-aware Allocation TRansformer (CATR) that learns category-aware representations for specific objects and produces corresponding category-aware attention maps for object localization. First, we introduce a Category-aware Stimulation Module (CSM) to induce learnable category biases for selfattention maps, providing auxiliary supervision to guide the learning of more effective transformer representations. Second, we design an Object Constraint Module ( OCM) to refine the object regions for the category-aware attention maps in a self-supervised manner. Extensive experiments on the CUB-200-2011 and ILSVRC datasets demonstrate that the proposed CATR achieves significant and consistent performance improvements over competing approaches.

引用

页码：6620 / 6629

页数：10

共 40 条

[1]

[Anonymous], 2020, ECCV

[2]

[Anonymous], 2021, IEEE ICCV, DOI DOI 10.1109/ICCV48922.2021.00042

[3]

BAI HT, 2022, ECCV, V3669, P612, DOI DOI 10.1007/978-3-031-20077-9_36

[4] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[5] E2Net: Excitative-Expansile Learning forWeakly Supervised Object Localization [J].

Chen, Zhiwei ;

Cao, Liujuan ;

Shen, Yunhang ;

Lian, Feihong ;

Wu, Yongjian ;

Ji, Rongrong .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :573-581

[6]

Chen ZW, 2022, AAAI CONF ARTIF INTE, P410

[7] Error identification and verification of cycloid gear profile surfaces in precision forming grinding [J].

Chen, Zhong ;

Li, Wenjie ;

Zhang, Xianmin ;

Wang, Longxuan ;

Li, Yuanyuan ;

Dang, Junpeng .

AUSTRALIAN JOURNAL OF MECHANICAL ENGINEERING, 2021, 19 (01) :81-94

[8] Attention-based Dropout Layer for Weakly Supervised Object Localization [J].

Choe, Junsuk ;

Shim, Hyunjung .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2214-2223

[9]

Dosovitskiy A., 2021, P 9 INT C LEARN REPR

[10] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization [J].

Gao, Wei ;

Wan, Fang ;

Pan, Xingjia ;

Peng, Zhiliang ;

Tian, Qi ;

Han, Zhenjun ;

Zhou, Bolei ;

Ye, Qixiang .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2866-2875

← 1 2 3 4 →