Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

被引：23

作者：

Liu, Xuejing ^{[1
]}

Li, Liang ^{[1
]}

Wang, Shuhui ^{[1
]}

Zha, Zheng-Jun ^{[2
]}

Li, Zechao ^{[3
]}

Tian, Qi ^{[4
]}

Huang, Qingming ^{[5
,6
,7
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, CAS, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

[2] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Anhui, Peoples R China

[3] Nanjing Univ Sci & Technol, Sch Comp Sci, Nanjing 210094, Jiangsu, Peoples R China

[4] Huawei Cloud & AI, Shenzhen 518129, Guangdong, Peoples R China

[5] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 100190, Peoples R China

[6] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China

[7] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 03期

基金：

国家重点研发计划;

关键词：

Entity enhancement; adaptive reconstruction; referring expression grounding;

D O I：

10.1109/TPAMI.2022.3186410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression while lacking the correspondence between target and expression. Two main problems exist in weakly supervised REG. First, the lack of region-level annotations introduces ambiguities between proposals and queries. Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects. To address the above challenges, we design an entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction. In entity enhancement, we calculate semantic similarity as supervision to select the candidate proposals. Adaptive grounding calculates the ranking score of candidate proposals upon subject, location and context with hierarchical attention. Collaborative reconstruction measures the ranking result from three perspectives: adaptive reconstruction, language reconstruction and attribute classification. The adaptive mechanism helps to alleviate the variance of different referring expressions. Experiments on five datasets show EARN outperforms existing state-of-the-art methods. Qualitative results demonstrate that the proposed EARN can better handle the situation where multiple objects of a particular category are situated together.

引用

页码：3003 / 3018

页数：16

共 41 条

[1] Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Liu, Xuejing
Li, Liang
Wang, Shuhui
Zha, Zheng-Jun
Meng, Dechao
Huang, Qingming
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2611 - 2620
[2] Progressive Semantic Reconstruction Network for Weakly Supervised Referring Expression Grounding
Ji, Zhong
Wu, Jiahe
Wang, Yaodong
Yang, Aiping
Han, Jungong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13058 - 13070
[3] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
Liu, Xuejing
Li, Liang
Wang, Shuhui
Zha, Zheng-Jun
Su, Li
Huang, Qingming
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
[4] Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
Sun, Mingjie
Xiao, Jimin
Lim, Eng Gee
Liu, Si
Goulermas, John Y.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 4189 - 4195
[5] Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation
Mi, Jinpeng
Chen, Zhiqian
Zhang, Jianwei
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1254 - 1260
[6] Weakly Supervised Referring Expression Grounding via Target-Guided Knowledge Distillation
Mi, Jinpeng
Tang, Song
Ma, Zhiyuan
Liu, Dan
Li, Qingdu
Zhang, Jianwei
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8299 - 8305
[7] Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
Mi, Jinpeng
Wermter, Stefan
Zhang, Jianwei
KNOWLEDGE-BASED SYSTEMS, 2024, 286
[8] SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Nag, Sayan
Goswami, Koustava
Karanam, Srikrishna
COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 485 - 503
[9] Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding
Tang, Kefan
He, Lihuo
Wang, Nannan
Gao, Xinbo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 95 - 107
[10] Universal Relocalizer forWeakly Supervised Referring Expression Grounding
Zhang, Panpan
Liu, Meng
Song, Xuemeng
Cao, Da
Gao, Zan
Nie, Liqiang
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)

← 1 2 3 4 5 →