LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

被引：19

作者：

Li, Gen ^{[1
]}

Jampani, Varun ^{[2
]}

Sun, Deqing ^{[2
]}

Sevilla-Lara, Laura ^{[1
]}

机构：

[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[2] Google Res, Mountain View, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1109/CVPR52729.2023.01051

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Humans excel at acquiring knowledge through observation. For example, we can learn to use new tools by watching demonstrations. This skill is fundamental for intelligent systems to interact with the world. A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding. In this paper, we address this problem and propose a framework called LOCATE that can identify matching object parts across images, to transfer knowledge from images where an object is being used (exocentric images used for learning), to images where the object is inactive (egocentric ones used to test). To this end, we first find interaction areas and extract their feature embeddings. Then we learn to aggregate the embeddings into compact prototypes (human, object part, and background), and select the one representing the object part. Finally, we use the selected prototype to guide affordance grounding. We do this in a weakly supervised manner, learning only from image-level affordance and object labels. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods by a large margin on both seen and unseen objects.

引用

页码：10922 / 10931

页数：10

共 50 条

[31] Object Instance Mining for Weakly Supervised Object Detection [J].

Lin, Chenhao ;

Wang, Siwen ;

Xu, Dongqi ;

Lu, Yu ;

Zhang, Wayne .

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 :11482-11489

[32] Contrastive Perturbation Network for Weakly Supervised Temporal Sentence Grounding [J].

Han, Tingting ;

Lv, Yuanxin ;

Yu, Zhou ;

Yu, Jun ;

Fan, Jianping ;

Yuan, Liu .

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 :446-460

[33] Weakly Supervised Grounding for VQA in Vision-Language Transformers [J].

Khan, Aisha Urooj ;

Kuehne, Hilde ;

Gan, Chuang ;

Lobo, Niels Da Vitoria ;

Shah, Mubarak .

COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 :652-670

[34] Weakly-supervised Visual Grounding of Phrases with Linguistic Structures [J].

Xiao, Fanyi ;

Sigal, Leonid ;

Lee, Yong Jae .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5253-5262

[35] Inverse Compositional Learning for Weakly-supervised Relation Grounding [J].

Li, Huan ;

Wei, Ping ;

Ma, Zeyu ;

Zheng, Nanning .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :15431-15441

[36] Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding [J].

Liu, Xuejing ;

Li, Liang ;

Wang, Shuhui ;

Zha, Zheng-Jun ;

Meng, Dechao ;

Huang, Qingming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2611-2620

[37] Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation [J].

Wang, Liwei ;

Huang, Jing ;

Li, Yin ;

Xu, Kun ;

Yang, Zhengyuan ;

Yu, Dong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14085-14095

[38] Counterfactual contrastive learning for weakly supervised temporal sentence grounding [J].

Xu, Yenan ;

Xu, Wanru ;

Miao, Zhenjiang .

NEUROCOMPUTING, 2025, 624

[39] Momentum Pseudo-Labeling for Weakly Supervised Phrase Grounding [J].

Kuang, Dongdong ;

Zhang, Richong ;

Nie, Zhijie ;

Chen, Junfan ;

Kim, Jaein .

THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 23, 2025, :24348-24356

[40] Iterative Proposal Refinement for Weakly-Supervised Video Grounding [J].

School of Electronic and Computer Engineering, Peking University, China ;

不详 ;

不详 .

Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, (6524-6534) :6524-6534

← 1 2 3 4 5 →