On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

被引：2

作者：

Anwaar, Muhammad Umer ^{[1
]}

Pan, Zhihui

Kleinsteuber, Martin

机构：

[1] Unite Network SE, Leipzig, Germany

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Compositional Learning; Multimodal; Variational Graph Autoencoder; CZSL; Open World; Composition of Concepts;

D O I：

10.1145/3503161.3547798

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, where infeasible compositions dominate the search space. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts (nodes) as well as feasibility of their compositions (via edges). Such modelling makes CVGAE scalable to real-world application scenarios. This is in contrast to SOTA method, CGE [33], which is computationally very expensive. e.g.for benchmark C-GQA dataset, CGE requires 3.94x10(5) nodes, whereas CVGAE requires only 1323 nodes. We learn a mapping of the graph and image embeddings onto a common embedding space. CVGAE adopts a deep metric learning approach and learns a similarity metric in this space via bi-directional contrastive loss between projected graph and image embeddings. We validate the effectiveness of our approach on three benchmark datasets.We also demonstrate via an image retrieval task that the representations learnt by CVGAE are better suited for compositional generalization.

引用

页码：4645 / 4654

页数：10

共 48 条

[1] A review of uncertainty quantification in deep learning: Techniques, applications and challenges
Abdar, Moloud
Pourpanah, Farhad
Hussain, Sadiq
Rezazadegan, Dana
Liu, Li
Ghavamzadeh, Mohammad
Fieguth, Paul
Cao, Xiaochun
Khosravi, Abbas
Acharya, U. Rajendra
Makarenkov, Vladimir
Nahavandi, Saeid
[J]. INFORMATION FUSION, 2021, 76 : 243 - 297
[2] Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning
Al-Halah, Ziad
Tapaswi, Makarand
Stiefelhagen, Rainer
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5975 - 5984
[3] [Anonymous], 2017, P 34 INT C MACH LEAR
[4] Compositional Learning of Image-Text Query for Image Retrieval
Anwaar, Muhammad Umer
Labintcev, Egor
Kleinsteuber, Martin
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
[5] RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING
BIEDERMAN, I
[J]. PSYCHOLOGICAL REVIEW, 1987, 94 (02) : 115 - 147
[6] Bohnet B, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2642
[7] Bojanowski P., 2017, Trans. Assoc. Comput. Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
[8] Chao Wei-Lun, 2017, ARXIV160504253 CS CV
[9] Inferring Analogous Attributes
Chen, Chao-Yeh
Grauman, Kristen
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 200 - 207
[10] Chen ZheweiWei Ming, 2020, ICML

← 1 2 3 4 5 →