On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

被引:2
作者
Anwaar, Muhammad Umer [1 ]
Pan, Zhihui
Kleinsteuber, Martin
机构
[1] Unite Network SE, Leipzig, Germany
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Compositional Learning; Multimodal; Variational Graph Autoencoder; CZSL; Open World; Composition of Concepts;
D O I
10.1145/3503161.3547798
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, where infeasible compositions dominate the search space. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts (nodes) as well as feasibility of their compositions (via edges). Such modelling makes CVGAE scalable to real-world application scenarios. This is in contrast to SOTA method, CGE [33], which is computationally very expensive. e.g.for benchmark C-GQA dataset, CGE requires 3.94x10(5) nodes, whereas CVGAE requires only 1323 nodes. We learn a mapping of the graph and image embeddings onto a common embedding space. CVGAE adopts a deep metric learning approach and learns a similarity metric in this space via bi-directional contrastive loss between projected graph and image embeddings. We validate the effectiveness of our approach on three benchmark datasets.We also demonstrate via an image retrieval task that the representations learnt by CVGAE are better suited for compositional generalization.
引用
收藏
页码:4645 / 4654
页数:10
相关论文
共 48 条
  • [1] A review of uncertainty quantification in deep learning: Techniques, applications and challenges
    Abdar, Moloud
    Pourpanah, Farhad
    Hussain, Sadiq
    Rezazadegan, Dana
    Liu, Li
    Ghavamzadeh, Mohammad
    Fieguth, Paul
    Cao, Xiaochun
    Khosravi, Abbas
    Acharya, U. Rajendra
    Makarenkov, Vladimir
    Nahavandi, Saeid
    [J]. INFORMATION FUSION, 2021, 76 : 243 - 297
  • [2] Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning
    Al-Halah, Ziad
    Tapaswi, Makarand
    Stiefelhagen, Rainer
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5975 - 5984
  • [3] [Anonymous], 2017, P 34 INT C MACH LEAR
  • [4] Compositional Learning of Image-Text Query for Image Retrieval
    Anwaar, Muhammad Umer
    Labintcev, Egor
    Kleinsteuber, Martin
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
  • [5] RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING
    BIEDERMAN, I
    [J]. PSYCHOLOGICAL REVIEW, 1987, 94 (02) : 115 - 147
  • [6] Bohnet B, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2642
  • [7] Bojanowski P., 2017, Trans. Assoc. Comput. Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
  • [8] Chao Wei-Lun, 2017, ARXIV160504253 CS CV
  • [9] Inferring Analogous Attributes
    Chen, Chao-Yeh
    Grauman, Kristen
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 200 - 207
  • [10] Chen ZheweiWei Ming, 2020, ICML