CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引:0
|
作者
Cai, Hongyi [1 ]
Zhu, Anna [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年
关键词
Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;
D O I
10.1109/ICIP51287.2024.10647599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.
引用
收藏
页码:2041 / 2047
页数:7
相关论文
共 47 条
  • [31] A Novel Siamese Network for Few/Zero-Shot Handwritten Character Recognition Tasks
    Elaraby, Nagwa
    Barakat, Sherif
    Rezk, Amira
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 1 - 18
  • [32] Zero-Shot Chinese Text Recognition via Matching Class Embedding
    Huang, Yuhao
    Jin, Lianwen
    Peng, Dezhi
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 127 - 141
  • [33] Chinese medical named entity recognition based on zero-shot learning
    Zhou, Menglin
    Gong, Kecun
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 190 - 195
  • [34] HierCode: A lightweight hierarchical codebook for zero-shot Chinese text recognition
    Zhang, Yuyi
    Zhu, Yuanzhi
    Peng, Dezhi
    Zhang, Peirong
    Yang, Zhenhua
    Yang, Zhibo
    Jin, Lianwen
    PATTERN RECOGNITION, 2025, 158
  • [35] Cross-Modal Visual Correspondences Learning Without External Semantic Information for Zero-Shot Sketch-Based Image Retrieval
    Gao, Zhijie
    Wang, Kai
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 342 - 353
  • [36] Zero-shot action recognition by clustered representation with redundancy-free features
    Limin Xia
    Xin Wen
    Machine Vision and Applications, 2023, 34
  • [37] Zero-shot action recognition by clustered representation with redundancy-free features
    Xia, Limin
    Wen, Xin
    MACHINE VISION AND APPLICATIONS, 2023, 34 (06)
  • [38] Cross-Domain Alignment for Zero-Shot Sketch-Based Image Retrieval
    Wang, Xu
    Peng, Dezhong
    Hu, Peng
    Gong, Yunhong
    Chen, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 7024 - 7035
  • [39] WAD-CMSN: Wasserstein distance-based cross-modal semantic network for zero-shot sketch-based image retrieval
    Xu, Guanglong
    Hu, Zhensheng
    Cai, Jia
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2023, 21 (02)
  • [40] Multi-Layer Cross Loss Model for Zero-Shot Human Activity Recognition
    Wu, Tong
    Chen, Yiqiang
    Gu, Yang
    Wang, Jiwei
    Zhang, Siyu
    Zhechen, Zhanghu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 210 - 221