Knowledge Guided Transformer Network for Compositional Zero-Shot Learning

被引:0
|
作者
Panda, Aditya [1 ]
Prasad, Dipti [1 ]
机构
[1] Indian Stat Inst, Kolkata, India
关键词
Compositionality; Compositional zero-shot learning; state-object composi- tion; partial association;
D O I
10.1145/3687129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Compositional Zero-shot Learning (CZSL) attempts to recognise images of new compositions of states and objects when images of only a subset of state-object compositions are available as training data. An example of CZSL is to recognise images of peeled apple by a model when it is trained using images of peeled orange, ripe apple and ripe orange. There are two major challenges in solving CZSL. First, the visual features of a state vary depending on the context of a state-object composition. For example state like ripe produces distinct visual properties in the compositions ripe orange and ripe banana. Hence, understanding the context dependency of state features is a necessary requirement to solve CZSL. Second, the extent of association between the features of a state and an object varies significantly in different images of same composition. For example, in different images of peeled oranges, the oranges may be peeled to different extents. As a consequence, the visual features of images of the class peeled orange may vary. Hence, there exists a significant amount of intra-class variability among the visual features of different images of a composition. Existing approaches merely look for the existence or absence of features of particular state or object in a composition. Our approach not only looks for the existence of a particular state features or object features but also the extent of association of state features and object features to better tackle the intra-class variability in visual features of compositional images. The proposed architecture is constructed using a novel Knowledge Guided Transformer. The transformer-based framework is utilised for processing larger context dependency between the state and object. Extensive experiments on C-GQA, MIT-States and UT-Zappos50k datasets demonstrate the superiority of the proposed approach in comparison with the state-of-the-art in both open-world and closed-world CZSL settings.
引用
收藏
页数:25
相关论文
共 24 条
  • [21] Preserving text space integrity for robust compositional zero-shot learning via mixture of pretrained experts
    Hao, Zehua
    Liu, Fang
    Jiao, Licheng
    Du, Yaoyang
    Li, Shuo
    Wang, Hao
    Li, Pengfang
    Liu, Xu
    Chen, Puhua
    NEUROCOMPUTING, 2025, 614
  • [22] Simple Primitives With Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-Shot Learning
    Liu, Zhe
    Li, Yun
    Yao, Lina
    Chang, Xiaojun
    Fang, Wei
    Wu, Xiaojun
    El Saddik, Abdulmotaleb
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 543 - 560
  • [23] Focusing on Valid Search Space in Open-World Compositional Zero-Shot Learning by Leveraging Misleading Answers
    Kim, Soohyeong
    Lee, Sangjun
    Choi, Yong Suk
    IEEE ACCESS, 2024, 12 : 165822 - 165830
  • [24] Zero-shot Scene Graph Generation via Triplet Calibration and Reduction
    Li, Jiankai
    Wang, Yunhong
    Li, Weixin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)