Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation

被引：1

作者：

Mi, Jinpeng ^{[1
]}

Chen, Zhiqian ^{[1
]}

Zhang, Jianwei ^{[2
]}

机构：

[1] Univ Shanghai Sci & Technol, Inst Machine Intelligence IMI, Shanghai, Peoples R China

[2] Univ Hamburg, Dept Informat, Tech Aspects Multimodal Syst TAMS, Hamburg, Germany

来源：

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS | 2023年

基金：

美国国家科学基金会;

关键词：

RECONSTRUCTION;

D O I：

10.1109/IROS55552.2023.10341909

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised referring expression grounding (WREG) is an attractive and challenging task for grounding target regions in images by understanding given referring expressions. WREG learns to ground target objects without the manual annotations between image regions and referring expressions during the model training phase. Different from the predominant grounding pattern of existing models, which locates target objects by reconstructing the region-expression correspondence, we investigate WREG from a novel perspective and enrich the prevailing pattern with self-knowledge distillation. Specifically, we propose a target-guided self-knowledge distillation approach that adopts the target prediction knowledge learned from the previous training iterations as the teacher to guide the subsequent training procedure. In order to avoid the misleading caused by the teacher knowledge with low prediction confidence, we present an uncertainty-aware knowledge refinement strategy to adaptively rectify the teacher knowledge by learning dynamic threshold values based on the model prediction uncertainty. To validate the proposed approach, we implement extensive experiments on three benchmark datasets, i.e., RefCOCO, RefCOCO+, and RefCOCOg. Our approach achieves new state-of-the-art results on several splits of the benchmark datasets, showcasing the advantage of the proposed framework for WREG. The implementation codes and trained models are available at: https://github.com/dami23/WREG Self KD.

引用

页码：1254 / 1260

页数：7

共 50 条

[21] Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
Sun, Mingjie
Xiao, Jimin
Lim, Eng Gee
Liu, Si
Goulermas, John Y.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 4189 - 4195
[22] Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
Chen, Kan
Gao, Jiyang
Nevatia, Ram
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4042 - 4050
[23] Sliding Cross Entropy for Self-Knowledge Distillation
Lee, Hanbeen
Kim, Jeongho
Woo, Simon S.
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1044 - 1053
[24] Self-Knowledge Distillation with Progressive Refinement of Targets
Kim, Kyungyul
Ji, ByeongMoon
Yoon, Doyoung
Hwang, Sangheum
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6547 - 6556
[25] Self-knowledge distillation for surgical phase recognition
Zhang, Jinglu
Barbarisi, Santiago
Kadkhodamohammadi, Abdolrahim
Stoyanov, Danail
Luengo, Imanol
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (01) : 61 - 68
[26] Diversified branch fusion for self-knowledge distillation
Long, Zuxiang
Ma, Fuyan
Sun, Bin
Tan, Mingkui
Li, Shutao
INFORMATION FUSION, 2023, 90 : 12 - 22
[27] Noisy Self-Knowledge Distillation for Text Summarization
Liu, Yang
Shen, Sheng
Lapata, Mirella
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 692 - 703
[28] Self-knowledge distillation for surgical phase recognition
Jinglu Zhang
Santiago Barbarisi
Abdolrahim Kadkhodamohammadi
Danail Stoyanov
Imanol Luengo
International Journal of Computer Assisted Radiology and Surgery, 2024, 19 : 61 - 68
[29] Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation
Ji, Mingi
Shin, Seungjae
Hwang, Seunghyun
Park, Gibeom
Moon, Il-Chul
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10659 - 10668
[30] Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation
Vyas, Yogarshi
Carpuat, Marine
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5285 - 5296

← 1 2 3 4 5 →