A Cross-Modal Alignment for Zero-Shot Image Classification

被引:5
|
作者
Wu, Lu [1 ,2 ]
Wu, Chenyu [2 ]
Guo, Han [3 ]
Zhao, Zhihao [2 ]
机构
[1] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen 518000, Peoples R China
[2] Wuhan Univ Technol, Sch Informat Engn, Wuhan 430070, Peoples R China
[3] Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430079, Peoples R China
关键词
Visualization; Semantics; Training data; Feature extraction; Object recognition; Monitoring; Image classification; Cross-modal alignment; zero-shot image classification; text attribute query; cosine similarity;
D O I
10.1109/ACCESS.2023.3237966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the query of text attribute learned from the seen classes to guide local feature responses in unseen classes. First, an encoder is used to align semantic matching between visual features and their corresponding text attribute. Second, an attention module is used to get response maps through feature maps activated by the query of text attribute. Finally, the cosine distance metric is used to measure the matching degree of the text attribute and its corresponding feature response. The experiment results show that the method get better performance than existing Zero-shot Learning in embedding-based methods as well as other generative methods in CUB-200-2011 dataset.
引用
收藏
页码:9067 / 9073
页数:7
相关论文
共 50 条
  • [11] Cross-Modal Attention Alignment Network with Auxiliary Text Description for Zero-Shot Sketch-Based Image Retrieval
    Su, Hanwen
    Song, Ge
    Huang, Kai
    Wang, Jiyan
    Yang, Ming
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 52 - 65
  • [12] Cross-modal Representation Learning for Zero-shot Action Recognition
    Lin, Chung-Ching
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    Li, Linjie
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
  • [13] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [14] Cross-modal propagation network for generalized zero-shot learning
    Guo, Ting
    Liang, Jianqing
    Liang, Jiye
    Xie, Guo-Sen
    PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131
  • [15] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
    Tian J.-L.
    Xu X.
    Shen F.-M.
    Shen H.-T.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):
  • [16] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [17] CHOP: An orthogonal hashing method for zero-shot cross-modal retrieval
    Yuan, Xu
    Wang, Guangze
    Chen, Zhikui
    Zhong, Fangming
    PATTERN RECOGNITION LETTERS, 2021, 145 : 247 - 253
  • [18] A Simplified Framework for Zero-shot Cross-Modal Sketch Data Retrieval
    Chaudhuri, Ushasi
    Banerjee, Biplab
    Bhattacharya, Avik
    Datcu, Mihai
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 699 - 706
  • [19] DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning
    Chen, Zhuo
    Huang, Yufeng
    Chen, Jiaoyan
    Geng, Yuxia
    Zhang, Wen
    Fang, Yin
    Pan, Jeff Z.
    Chen, Huajun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 405 - 413
  • [20] Discrete asymmetric zero-shot hashing with application to cross-modal retrieval
    Shu, Zhenqiu
    Yong, Kailing
    Yu, Jun
    Gao, Shengxiang
    Mao, Cunli
    Yu, Zhengtao
    NEUROCOMPUTING, 2022, 511 : 366 - 379