M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning

被引：0

作者：

Zhao, Peng ^{[1
]}

Wang, Qiangchang ^{[1
]}

Yin, Yilong ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan, Shandong, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

zero-shot learning; mixup; masked image modeling; Transformer;

D O I：

10.1145/3581783.3612104

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the zero-shot learning (ZSL), learned representation spaces are often biased toward seen classes, thus limiting the ability to predict previously unseen classes. In this paper, we propose Masked token Mixup and cross-Modal Reconstruction for zero-shot learning, termed as M3R, which can significantly alleviate the bias toward seen classes. The M3R mainly consists of Random Token Mixup (RTM), Unseen Class Detection (UCD), and Hard Cross-modal Reconstruction (HCR). Firstly, mappings without proper adaptations to unseen classes would cause the bias toward seen classes. To address this issue, the RTM is introduced to generate diverse unseen class agents, thereby broadening the representation space to cover unknown classes. It is applied at a randomly selected layer in the Vision Transformer, producing smooth low- and high-level representation space boundaries to cover rich attributes. Secondly, it should be noted that unseen class agents generated by the RTM may be mixed with seen class samples. To overcome this challenge, the UCD is designed to generate greater entropy values for unseen classes, thereby distinguishing seen classes from unseen classes. Thirdly, to further mitigate the bias toward seen classes and explore associations between semantics and visual images, the HCR is proposed, which can reconstruct masked pixels based on few discriminative tokens and attribute embeddings. This approach can enable models to have a deep understanding of image contents and build powerful connections between semantic attributes and visual information. Both qualitative and quantitative results demonstrate the effectiveness and usefulness of our proposed M3R model.

引用

页码：3161 / 3171

页数：11

共 50 条

[1] CROSS-MODAL REPRESENTATION RECONSTRUCTION FOR ZERO-SHOT CLASSIFICATION
Wang, Yu
Zhao, Shenjie
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2820 - 2824
[2] Cross-modal Zero-shot Hashing
Liu, Xuanwu
Li, Zhao
Wang, Jun
Yu, Guoxian
Domeniconi, Carlotta
Zhang, Xiangliang
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 449 - 458
[3] Cross-modal Representation Learning for Zero-shot Action Recognition
Lin, Chung-Ching
Lin, Kevin
Wang, Lijuan
Liu, Zicheng
Li, Linjie
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
[4] Manifold regularized cross-modal embedding for zero-shot learning
Ji, Zhong
Yu, Yunlong
Pang, Yanwei
Guo, Jichang
Zhang, Zhongfei
INFORMATION SCIENCES, 2017, 378 : 48 - 58
[5] Cross-modal propagation network for generalized zero-shot learning
Guo, Ting
Liang, Jianqing
Liang, Jiye
Xie, Guo-Sen
PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131
[6] Generalized Zero-Shot Cross-Modal Retrieval
Dutta, Titir
Biswas, Soma
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 5953 - 5962
[7] DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning
Chen, Zhuo
Huang, Yufeng
Chen, Jiaoyan
Geng, Yuxia
Zhang, Wen
Fang, Yin
Pan, Jeff Z.
Chen, Huajun
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 405 - 413
[8] Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification
Fang, Zhiyu
Zhu, Xiaobin
Yang, Chun
Han, Zheng
Qin, Jingyan
Yin, Xu-Cheng
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6605 - 6613
[9] Cross-modal prototype learning for zero-shot handwritten character recognition
Ao, Xiang
Zhang, Xu-Yao
Liu, Cheng-Lin
PATTERN RECOGNITION, 2022, 131
[10] A Cross-Modal Alignment for Zero-Shot Image Classification
Wu, Lu
Wu, Chenyu
Guo, Han
Zhao, Zhihao
IEEE ACCESS, 2023, 11 : 9067 - 9073

← 1 2 3 4 5 →