Category-aware Allocation Transformer for Weakly Supervised Object Localization

被引:6
作者
Chen, Zhiwei [1 ]
Ding, Jinren [1 ]
Cao, Liujuan [1 ]
Shen, Yunhang [2 ]
Zhang, Shengchuan [1 ]
Jiang, Guannan [3 ]
Ji, Rongrong [1 ]
机构
[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Co, Minist Educ China, Xiamen, Peoples R China
[2] Tencent Youtu Lab, Shenzhen, Peoples R China
[3] CATL, Ningde, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.00611
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised object localization (WSOL) aims to localize objects based on only image-level labels as supervision. Recently, transformers have been introduced into WSOL, yielding impressive results. The self-attention mechanism and multilayer perceptron structure in transformers preserve long-range feature dependency, facilitating complete localization of the full object extent. However, current transformer-based methods predict bounding boxes using category-agnostic attention maps, which may lead to confused and noisy object localization. To address this issue, we propose a novel Category-aware Allocation TRansformer (CATR) that learns category-aware representations for specific objects and produces corresponding category-aware attention maps for object localization. First, we introduce a Category-aware Stimulation Module (CSM) to induce learnable category biases for selfattention maps, providing auxiliary supervision to guide the learning of more effective transformer representations. Second, we design an Object Constraint Module ( OCM) to refine the object regions for the category-aware attention maps in a self-supervised manner. Extensive experiments on the CUB-200-2011 and ILSVRC datasets demonstrate that the proposed CATR achieves significant and consistent performance improvements over competing approaches.
引用
收藏
页码:6620 / 6629
页数:10
相关论文
共 40 条
[1]  
BAI HT, 2022, ECCV, V3669, P612, DOI DOI 10.1007/978-3-031-20077-9_36
[2]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[3]   E2Net: Excitative-Expansile Learning forWeakly Supervised Object Localization [J].
Chen, Zhiwei ;
Cao, Liujuan ;
Shen, Yunhang ;
Lian, Feihong ;
Wu, Yongjian ;
Ji, Rongrong .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :573-581
[4]  
Chen ZW, 2022, AAAI CONF ARTIF INTE, P410
[5]   Error identification and verification of cycloid gear profile surfaces in precision forming grinding [J].
Chen, Zhong ;
Li, Wenjie ;
Zhang, Xianmin ;
Wang, Longxuan ;
Li, Yuanyuan ;
Dang, Junpeng .
AUSTRALIAN JOURNAL OF MECHANICAL ENGINEERING, 2021, 19 (01) :81-94
[6]   Attention-based Dropout Layer for Weakly Supervised Object Localization [J].
Choe, Junsuk ;
Shim, Hyunjung .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2214-2223
[7]  
Dosovitskiy A., 2021, INT C LEARN REPRESEN
[8]   TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization [J].
Gao, Wei ;
Wan, Fang ;
Pan, Xingjia ;
Peng, Zhiliang ;
Tian, Qi ;
Han, Zhenjun ;
Zhou, Bolei ;
Ye, Qixiang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2866-2875
[9]  
Guo Guangyu, 2021, IEEE CVPR
[10]   ViTOL: Vision Transformer for Weakly Supervised Object Localization [J].
Gupta, Saurav ;
Lakhotia, Sourav ;
Rawat, Abhay ;
Tallamraju, Rahul .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :4100-4109