Learning Local Semantic Region Activations for Weakly Supervised Object Localization

被引：1

作者：

Xu, Can ^{[1
]}

Hui, Le ^{[2
]}

Han, Yuehui ^{[1
]}

Jiang, Haobo ^{[1
]}

Chen, Jiaxin ^{[1
]}

Xie, Jin ^{[3
,4
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China

[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China

[4] Nanjing Univ, Sch Intelligence Sci & Technol, Nanjing 210093, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 10期

关键词：

Location awareness; Semantics; Correlation; Feature extraction; Task analysis; Pipelines; Proposals; Weakly-supervised; object localization; spatial association; semantic spatial activation; CLASSIFICATION;

D O I：

10.1109/TCSVT.2024.3404100

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Weakly supervised object localization (WSOL) aims to train instance-level locators by exploiting accessible image-level labels. By multiplying channel-wise features with classification weights and then adding them together, most prior works follow the pipeline of the Class Activation Map (CAM) to collect the semantic responses, thereby highlighting regions that contribute to class prediction to achieve WSOL. However, CAM-based methods treat the class contributions of all pixel positions in a channel equally and assign dominant weights for the discriminative channels biasedly. This fails to express the fine-grained pixel-level semantic response of each channel and model the complex contextual relations between channels, resulting in the mixup of the activation value between non-discriminative foreground regions and the background. To alleviate these issues, we present a Local Semantic activation enhancement and Global Spatial correlation mining network (LSGS-Net) for accurate WSOL. Specifically, we first propose a local activation generation module to explicitly learn the semantic response of each pixel position from channels. Then, we design a regularization loss to supervise the consistency between similar local activations, which utilizes the cross-image information to improve the accuracy of local activations. We further propose a K-nearest Neighbors graph module to capture the spatial correlation between different local activations, which can adaptively assign more proper weights when fusing all local activation. In the inference stage, the bounding box will be determined with a foreground threshold. Extensive experiments show that LSGS-Net achieves significant and consistent improvement with various backbones on the CUB, ILSVRC, and OpenImages benchmarks, with a 97.5% and 75.3% GT-Known LOC on CUB and ILSVRC, respectively. For segmentation quality on OpenImages, LSGS-Net already exceeds the SOTA method by 1.2% pIoU and 1.9% PxAP.

引用

页码：10182 / 10193

页数：12

共 60 条

[1] Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration [J].

Bai, Haotian ;

Zhang, Ruimao ;

Wang, Jiong ;

Wan, Xiang .

COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 :612-628

[2]

Chen ZW, 2022, AAAI CONF ARTIF INTE, P410

[3] Evaluating Weakly Supervised Object Localization Methods Right [J].

Choe, Junsuk ;

Oh, Seong Joon ;

Lee, Seungho ;

Chun, Sanghyuk ;

Akata, Zeynep ;

Shim, Hyunjung .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3130-3139

[4] Region-based dropout with attention prior for weakly supervised object localization [J].

Choe, Junsuk ;

Han, Dongyoon ;

Yun, Sangdoo ;

Ha, Jung-Woo ;

Oh, Seong Joon ;

Shim, Hyunjung .

PATTERN RECOGNITION, 2021, 116

[5] Virtual to Real Adaptation of Pedestrian Detectors [J].

Ciampi, Luca ;

Messina, Nicola ;

Falchi, Fabrizio ;

Gennaro, Claudio ;

Amato, Giuseppe .

SENSORS, 2020, 20 (18) :1-14

[6] A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels [J].

Cong, Runmin ;

Qin, Qi ;

Zhang, Chen ;

Jiang, Qiuping ;

Wang, Shiqi ;

Zhao, Yao ;

Kwong, Sam .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) :534-548

[7] Solving the multiple instance problem with axis-parallel rectangles [J].

Dietterich, TG ;

Lathrop, RH ;

LozanoPerez, T .

ARTIFICIAL INTELLIGENCE, 1997, 89 (1-2) :31-71

[8] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization [J].

Gao, Wei ;

Wan, Fang ;

Pan, Xingjia ;

Peng, Zhiliang ;

Tian, Qi ;

Han, Zhenjun ;

Zhou, Bolei ;

Ye, Qixiang .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2866-2875

[9] Weakly-Supervised Semantic Feature Refinement Network for MMW Concealed Object Detection [J].

Gou, Shuiping ;

Wang, Xinlin ;

Mao, Shasha ;

Jiao, Licheng ;

Liu, Zhen ;

Zhao, Yinghai .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) :1363-1373

[10] Strengthen Learning Tolerance for Weakly Supervised Object Localization [J].

Guo, Guangyu ;

Han, Junwei ;

Wan, Fang ;

Zhang, Dingwen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7399-7408

← 1 2 3 4 5 6 →