Knowledge Guided Transformer Network for Compositional Zero-Shot Learning

被引：0

作者：

Panda, Aditya ^{[1
]}

Prasad, Dipti ^{[1
]}

机构：

[1] Indian Stat Inst, Kolkata, India

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 11期

关键词：

Compositionality; Compositional zero-shot learning; state-object composi- tion; partial association;

D O I：

10.1145/3687129

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Compositional Zero-shot Learning (CZSL) attempts to recognise images of new compositions of states and objects when images of only a subset of state-object compositions are available as training data. An example of CZSL is to recognise images of peeled apple by a model when it is trained using images of peeled orange, ripe apple and ripe orange. There are two major challenges in solving CZSL. First, the visual features of a state vary depending on the context of a state-object composition. For example state like ripe produces distinct visual properties in the compositions ripe orange and ripe banana. Hence, understanding the context dependency of state features is a necessary requirement to solve CZSL. Second, the extent of association between the features of a state and an object varies significantly in different images of same composition. For example, in different images of peeled oranges, the oranges may be peeled to different extents. As a consequence, the visual features of images of the class peeled orange may vary. Hence, there exists a significant amount of intra-class variability among the visual features of different images of a composition. Existing approaches merely look for the existence or absence of features of particular state or object in a composition. Our approach not only looks for the existence of a particular state features or object features but also the extent of association of state features and object features to better tackle the intra-class variability in visual features of compositional images. The proposed architecture is constructed using a novel Knowledge Guided Transformer. The transformer-based framework is utilised for processing larger context dependency between the state and object. Extensive experiments on C-GQA, MIT-States and UT-Zappos50k datasets demonstrate the superiority of the proposed approach in comparison with the state-of-the-art in both open-world and closed-world CZSL settings.

引用

页数：25

共 24 条

[21] Preserving text space integrity for robust compositional zero-shot learning via mixture of pretrained experts
Hao, Zehua
Liu, Fang
Jiao, Licheng
Du, Yaoyang
Li, Shuo
Wang, Hao
Li, Pengfang
Liu, Xu
Chen, Puhua
NEUROCOMPUTING, 2025, 614
[22] Simple Primitives With Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-Shot Learning
Liu, Zhe
Li, Yun
Yao, Lina
Chang, Xiaojun
Fang, Wei
Wu, Xiaojun
El Saddik, Abdulmotaleb
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 543 - 560
[23] Focusing on Valid Search Space in Open-World Compositional Zero-Shot Learning by Leveraging Misleading Answers
Kim, Soohyeong
Lee, Sangjun
Choi, Yong Suk
IEEE ACCESS, 2024, 12 : 165822 - 165830
[24] Zero-shot Scene Graph Generation via Triplet Calibration and Reduction
Li, Jiankai
Wang, Yunhong
Li, Weixin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)

← 1 2 3 →