A Cross-Modal Alignment for Zero-Shot Image Classification

被引:5
|
作者
Wu, Lu [1 ,2 ]
Wu, Chenyu [2 ]
Guo, Han [3 ]
Zhao, Zhihao [2 ]
机构
[1] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen 518000, Peoples R China
[2] Wuhan Univ Technol, Sch Informat Engn, Wuhan 430070, Peoples R China
[3] Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430079, Peoples R China
关键词
Visualization; Semantics; Training data; Feature extraction; Object recognition; Monitoring; Image classification; Cross-modal alignment; zero-shot image classification; text attribute query; cosine similarity;
D O I
10.1109/ACCESS.2023.3237966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the query of text attribute learned from the seen classes to guide local feature responses in unseen classes. First, an encoder is used to align semantic matching between visual features and their corresponding text attribute. Second, an attention module is used to get response maps through feature maps activated by the query of text attribute. Finally, the cosine distance metric is used to measure the matching degree of the text attribute and its corresponding feature response. The experiment results show that the method get better performance than existing Zero-shot Learning in embedding-based methods as well as other generative methods in CUB-200-2011 dataset.
引用
收藏
页码:9067 / 9073
页数:7
相关论文
共 50 条
  • [31] Unpaired robust hashing with noisy labels for zero-shot cross-modal retrieval
    Yong, Kailing
    Shu, Zhenqiu
    Yu, Zhengtao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [32] Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network
    Xu, Xing
    Tian, Jialin
    Lin, Kaiyi
    Lu, Huimin
    Shao, Jie
    Shen, Heng Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [33] Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval
    Yong, Kailing
    Shu, Zhenqiu
    Wang, Hongbin
    Yu, Zhengtao
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [34] Zero-shot image classification method base on deep supervised alignment
    Zeng S.-J.
    Pang S.-M.
    Hao W.-Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (11): : 2204 - 2214
  • [35] CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
    Lail, Haoran
    Yao, Qingsong
    Jiang, Zihang
    Wang, Rongsheng
    He, Zhiyang
    Tao, Xiaodong
    Zhou, S. Kevin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 11137 - 11146
  • [36] Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval
    Xu, Xing
    Lu, Huimin
    Song, Jingkuan
    Yang, Yang
    Shen, Heng Tao
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (06) : 2400 - 2413
  • [37] Towards Zero-shot Learning for End-to-end Cross-modal Translation Models
    Yang, Jichen
    Fang, Kai
    Liao, Minpeng
    Chen, Boxing
    Huang, Zhongqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13078 - 13087
  • [38] Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
    Mercea, Otniel-Bogdan
    Riesch, Lukas
    Koepke, A. Sophia
    Akata, Zeynep
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10543 - 10553
  • [39] Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval
    Li, Chuang
    Fei, Lunke
    Kang, Peipei
    Liang, Jiahao
    Fang, Xiaozhao
    Teng, Shaohua
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 459 - 472
  • [40] INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL
    Chakraborty, Bela
    Wang, Peng
    Wang, Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2648 - 2652