Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation

被引:1
|
作者
Mi, Jinpeng [1 ]
Chen, Zhiqian [1 ]
Zhang, Jianwei [2 ]
机构
[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence IMI, Shanghai, Peoples R China
[2] Univ Hamburg, Dept Informat, Tech Aspects Multimodal Syst TAMS, Hamburg, Germany
来源
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS | 2023年
基金
美国国家科学基金会;
关键词
RECONSTRUCTION;
D O I
10.1109/IROS55552.2023.10341909
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised referring expression grounding (WREG) is an attractive and challenging task for grounding target regions in images by understanding given referring expressions. WREG learns to ground target objects without the manual annotations between image regions and referring expressions during the model training phase. Different from the predominant grounding pattern of existing models, which locates target objects by reconstructing the region-expression correspondence, we investigate WREG from a novel perspective and enrich the prevailing pattern with self-knowledge distillation. Specifically, we propose a target-guided self-knowledge distillation approach that adopts the target prediction knowledge learned from the previous training iterations as the teacher to guide the subsequent training procedure. In order to avoid the misleading caused by the teacher knowledge with low prediction confidence, we present an uncertainty-aware knowledge refinement strategy to adaptively rectify the teacher knowledge by learning dynamic threshold values based on the model prediction uncertainty. To validate the proposed approach, we implement extensive experiments on three benchmark datasets, i.e., RefCOCO, RefCOCO+, and RefCOCOg. Our approach achieves new state-of-the-art results on several splits of the benchmark datasets, showcasing the advantage of the proposed framework for WREG. The implementation codes and trained models are available at: https://github.com/dami23/WREG Self KD.
引用
收藏
页码:1254 / 1260
页数:7
相关论文
共 50 条
  • [1] Weakly Supervised Referring Expression Grounding via Target-Guided Knowledge Distillation
    Mi, Jinpeng
    Tang, Song
    Ma, Zhiyuan
    Liu, Dan
    Li, Qingdu
    Zhang, Jianwei
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8299 - 8305
  • [2] Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
    Mi, Jinpeng
    Wermter, Stefan
    Zhang, Jianwei
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [3] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Su, Li
    Huang, Qingming
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
  • [4] Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection
    Chen, Ze
    Fu, Zhihang
    Huang, Jianqiang
    Tao, Mingyuan
    Jiang, Rongxin
    Tian, Xiang
    Chen, Yaowu
    Hua, Xian-Sheng
    IMAGE AND VISION COMPUTING, 2021, 116
  • [5] Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Meng, Dechao
    Huang, Qingming
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2611 - 2620
  • [6] Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
    Wang, Liwei
    Huang, Jing
    Li, Yin
    Xu, Kun
    Yang, Zhengyuan
    Yu, Dong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14085 - 14095
  • [7] Progressive Semantic Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Ji, Zhong
    Wu, Jiahe
    Wang, Yaodong
    Yang, Aiping
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13058 - 13070
  • [8] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [9] Self-knowledge distillation based on dynamic mixed attention
    Tang, Yuan
    Chen, Ying
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 4099 - 4108
  • [10] Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Li, Zechao
    Tian, Qi
    Huang, Qingming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3003 - 3018