Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation

被引：1

作者：

Mi, Jinpeng ^{[1
]}

Chen, Zhiqian ^{[1
]}

Zhang, Jianwei ^{[2
]}

机构：

[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence IMI, Shanghai, Peoples R China

[2] Univ Hamburg, Dept Informat, Tech Aspects Multimodal Syst TAMS, Hamburg, Germany

来源：

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS | 2023年

基金：

美国国家科学基金会;

关键词：

RECONSTRUCTION;

D O I：

10.1109/IROS55552.2023.10341909

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised referring expression grounding (WREG) is an attractive and challenging task for grounding target regions in images by understanding given referring expressions. WREG learns to ground target objects without the manual annotations between image regions and referring expressions during the model training phase. Different from the predominant grounding pattern of existing models, which locates target objects by reconstructing the region-expression correspondence, we investigate WREG from a novel perspective and enrich the prevailing pattern with self-knowledge distillation. Specifically, we propose a target-guided self-knowledge distillation approach that adopts the target prediction knowledge learned from the previous training iterations as the teacher to guide the subsequent training procedure. In order to avoid the misleading caused by the teacher knowledge with low prediction confidence, we present an uncertainty-aware knowledge refinement strategy to adaptively rectify the teacher knowledge by learning dynamic threshold values based on the model prediction uncertainty. To validate the proposed approach, we implement extensive experiments on three benchmark datasets, i.e., RefCOCO, RefCOCO+, and RefCOCOg. Our approach achieves new state-of-the-art results on several splits of the benchmark datasets, showcasing the advantage of the proposed framework for WREG. The implementation codes and trained models are available at: https://github.com/dami23/WREG Self KD.

引用

页码：1254 / 1260

页数：7

共 50 条

[1] Weakly Supervised Referring Expression Grounding via Target-Guided Knowledge Distillation
Mi, Jinpeng
Tang, Song
Ma, Zhiyuan
Liu, Dan
Li, Qingdu
Zhang, Jianwei
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8299 - 8305
[2] Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
Mi, Jinpeng
Wermter, Stefan
Zhang, Jianwei
KNOWLEDGE-BASED SYSTEMS, 2024, 286
[3] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
Liu, Xuejing
Li, Liang
Wang, Shuhui
Zha, Zheng-Jun
Su, Li
Huang, Qingming
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
[4] Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection
Chen, Ze
Fu, Zhihang
Huang, Jianqiang
Tao, Mingyuan
Jiang, Rongxin
Tian, Xiang
Chen, Yaowu
Hua, Xian-Sheng
IMAGE AND VISION COMPUTING, 2021, 116
[5] Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Liu, Xuejing
Li, Liang
Wang, Shuhui
Zha, Zheng-Jun
Meng, Dechao
Huang, Qingming
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2611 - 2620
[6] Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
Wang, Liwei
Huang, Jing
Li, Yin
Xu, Kun
Yang, Zhengyuan
Yu, Dong
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14085 - 14095
[7] Progressive Semantic Reconstruction Network for Weakly Supervised Referring Expression Grounding
Ji, Zhong
Wu, Jiahe
Wang, Yaodong
Yang, Aiping
Han, Jungong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13058 - 13070
[8] Self-knowledge distillation via dropout
Lee, Hyoje
Park, Yeachan
Seo, Hyun
Kang, Myungjoo
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
[9] Self-knowledge distillation based on dynamic mixed attention
Tang, Yuan
Chen, Ying
Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4099 - 4108
[10] Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Liu, Xuejing
Li, Liang
Wang, Shuhui
Zha, Zheng-Jun
Li, Zechao
Tian, Qi
Huang, Qingming
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3003 - 3018

← 1 2 3 4 5 →