CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引:0
|
作者
Cai, Hongyi [1 ]
Zhu, Anna [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年
关键词
Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;
D O I
10.1109/ICIP51287.2024.10647599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.
引用
收藏
页码:2041 / 2047
页数:7
相关论文
共 47 条
  • [21] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [22] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
    Tian J.-L.
    Xu X.
    Shen F.-M.
    Shen H.-T.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):
  • [23] Integrating Adversarial Generative Network with Variational Autoencoders towards Cross-Modal Alignment for Zero-Shot Remote Sensing Image Scene Classification
    Ma, Suqiang
    Liu, Chun
    Li, Zheng
    Yang, Wei
    REMOTE SENSING, 2022, 14 (18)
  • [24] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Jiao, Shichao
    Han, Xie
    Xiong, Fengguang
    Yang, Xiaowen
    Han, Huiyan
    He, Ligang
    Kuang, Liqun
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16) : 13469 - 13483
  • [25] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Shichao Jiao
    Xie Han
    Fengguang Xiong
    Xiaowen Yang
    Huiyan Han
    Ligang He
    Liqun Kuang
    Neural Computing and Applications, 2022, 34 : 13469 - 13483
  • [26] M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning
    Zhao, Peng
    Wang, Qiangchang
    Yin, Yilong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3161 - 3171
  • [27] Global Semantic Descriptors for Zero-Shot Action Recognition
    Estevam, Valter
    Laroca, Rayson
    Pedrini, Helio
    Menotti, David
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1843 - 1847
  • [28] Multi-modal zero-shot dynamic hand gesture recognition
    Rastgoo, Razieh
    Kiani, Kourosh
    Escalera, Sergio
    Sabokrou, Mohammad
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [29] Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
    Jiang, Huajie
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 121 - 138
  • [30] Zero-shot learning based on the fusion of global and local representations
    Qiang, Wang
    Mou, HongJin
    Jia, Wang
    Wei, Chunxiao
    Yu, Zhou
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (03)