RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension

被引:8
|
作者
Jin, Lei [1 ,2 ]
Luo, Gen [1 ]
Zhou, Yiyi [1 ,2 ]
Sun, Xiaoshuai [1 ,2 ]
Jiang, Guannan [3 ]
Shu, Annan [3 ]
Ji, Rongrong [1 ,2 ]
机构
[1] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Xiamen 361005, Peoples R China
[2] Xiamen Univ, Inst Artificial Intelligence, Xiamen 361005, Peoples R China
[3] Contemporary Amperex Technol Co Ltd CATE, Intelligent Mfg Dept, Ningde, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00263
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring Expression Comprehension (REC) is a task of grounding the referent based on an expression, and its development is greatly limited by expensive instance-level annotations. Most existing weakly supervised methods are built based on two-stage detection networks, which are computationally expensive. In this paper, we resort to the efficient one-stage detector and propose a novel weakly supervised model called RefCLIP. Specifically, RefCLIP redefines weakly supervised REC as an anchor-text matching problem, which can avoid the complex post-processing in existing methods. To achieve weakly supervised learning, we introduce anchor-based contrastive loss to optimize RefCLIP via numerous anchor-text pairs. Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models. With our careful designs, this scheme can even help existing REC models achieve better weakly supervised performance than RefCLIP, e.g., TransVG and SimREC. To validate our approaches, we conduct extensive experiments on four REC benchmarks, i.e., RefCOCO, RefCOCO+, RefCOCOg and ReferItGame. Experimental results not only report our significant performance gains over existing weakly supervised models, e.g., +24.87% on RefCOCO, but also show the 5x faster inference speed. Project: https://refclip.github.io.
引用
收藏
页码:2681 / 2690
页数:10
相关论文
共 50 条
  • [11] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Su, Li
    Huang, Qingming
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
  • [12] Weakly Supervised Referring Expression Grounding via Target-Guided Knowledge Distillation
    Mi, Jinpeng
    Tang, Song
    Ma, Zhiyuan
    Liu, Dan
    Li, Qingdu
    Zhang, Jianwei
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8299 - 8305
  • [13] Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Li, Zechao
    Tian, Qi
    Huang, Qingming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3003 - 3018
  • [14] Dynamic Graph Attention for Referring Expression Comprehension
    Yang, Sibei
    Li, Guanbin
    Yu, Yizhou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4643 - 4652
  • [15] Exploring Logical Reasoning for Referring Expression Comprehension
    Cheng, Ying
    Wang, Ruize
    Yu, Jiashuo
    Zhao, Rui-Wei
    Zhang, Yuejie
    Feng, Rui
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5047 - 5055
  • [16] InterREC: An Interpretable Method for Referring Expression Comprehension
    Wang, Wenbin
    Pagnucco, Maurice
    Xu, Chengpei
    Song, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9330 - 9342
  • [17] Image Segmentation With Language Referring Expression and Comprehension
    Sun, Jiaxing
    Li, Yujie
    Cai, Jintong
    Lu, Huimin
    Serikawa, Seiichi
    IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
  • [18] ScanFormer: Referring Expression Comprehension by Iteratively Scanning
    Sul, Wei
    Miao, Peihan
    Doul, Huanzhang
    Li, Xi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13449 - 13458
  • [19] Referring Expression Generation and Comprehension via Attributes
    Liu, Jingyu
    Wang, Liang
    Yang, Ming-Hsuan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4866 - 4874
  • [20] Revisiting Counterfactual Problems in Referring Expression Comprehension
    Yu, Zhihan
    Li, Ruifan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13438 - 13448