MetaCLUE: Towards Comprehensive Visual Metaphors Research

被引：14

作者：

Akula, Arjun R. ^{[1
]}

Driscoll, Brendan ^{[1
]}

Narayana, Pradyumna ^{[1
]}

Changpinyo, Soravit ^{[1
]}

Jia, Zhiwei ^{[1
]}

Damle, Suyash ^{[1
]}

Pruthi, Garima ^{[1
]}

Basu, Sugato ^{[1
]}

Guibas, Leonidas ^{[1
]}

Freeman, William T. ^{[1
]}

Li, Yuanzhen ^{[1
]}

Jampani, Varun ^{[1
]}

机构：

[1] Google, Mountain View, CA 94043 USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

SIMILARITY;

D O I：

10.1109/CVPR52729.2023.02222

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. We also collect high-quality and rich metaphor annotations (abstract objects, concepts, relationships along with their corresponding object boxes) as there do not exist any datasets that facilitate the evaluation of these tasks. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations, highlighting strengths and weaknesses of current approaches in visual metaphor classification, localization, understanding (retrieval, question answering, captioning) and generation (text-to-image synthesis) tasks. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities. Project page: https://metaclue.github.io

引用

页码：23201 / 23211

页数：11

共 58 条

[41] High-Resolution Image Synthesis with Latent Diffusion Models [J].

Rombach, Robin ;

Blattmann, Andreas ;

Lorenz, Dominik ;

Esser, Patrick ;

Ommer, Bjoern .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10674-10685

[42]

Saharia Chitwan, 2022, P ADV NEURAL INFORM

[43] IMAGES IN ADVERTISING - THE NEED FOR A THEORY OF VISUAL RHETORIC [J].

SCOTT, LM .

JOURNAL OF CONSUMER RESEARCH, 1994, 21 (02) :252-273

[44]

Steen GJ, 2010, CONV EVI LANG COMMUN, V14, P1

[45]

Stowe Kevin, 2021, Long Papers, V1, P6724

[46]

Tan MX, 2019, PR MACH LEARN RES, V97

[47]

Terai A, 2010, LECT NOTES COMPUT SC, V6353, P142

[48]

Tong XY, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P4673

[49]

Turney P., 2011, P C EMPIRICAL METHOD, P680

[50]

Veale T., 2016, SYNTHESIS LECT HUMAN, V9, P1, DOI DOI 10.2200/S00694ED1V01Y201601HLT031

← 1 2 3 4 5 6 →