CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引:0
|
作者
Cai, Hongyi [1 ]
Zhu, Anna [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年
关键词
Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;
D O I
10.1109/ICIP51287.2024.10647599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.
引用
收藏
页码:2041 / 2047
页数:7
相关论文
共 47 条
  • [1] Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval
    Xu, Xing
    Lin, Kaiyi
    Lu, Huimin
    Gao, Lianli
    Shen, Heng Tao
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1419 - 1428
  • [2] A Cross-Modal Alignment for Zero-Shot Image Classification
    Wu, Lu
    Wu, Chenyu
    Guo, Han
    Zhao, Zhihao
    IEEE ACCESS, 2023, 11 : 9067 - 9073
  • [3] Cross-modal Zero-shot Hashing
    Liu, Xuanwu
    Li, Zhao
    Wang, Jun
    Yu, Guoxian
    Domeniconi, Carlotta
    Zhang, Xiangliang
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 449 - 458
  • [4] Generalized Zero-Shot Cross-Modal Retrieval
    Dutta, Titir
    Biswas, Soma
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 5953 - 5962
  • [5] CROSS-MODAL REPRESENTATION RECONSTRUCTION FOR ZERO-SHOT CLASSIFICATION
    Wang, Yu
    Zhao, Shenjie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2820 - 2824
  • [6] Cross-modal propagation network for generalized zero-shot learning
    Guo, Ting
    Liang, Jianqing
    Liang, Jiye
    Xie, Guo-Sen
    PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131
  • [7] Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition
    Huang, Guanjie
    Luo, Xiangyu
    Wang, Shaowei
    Gu, Tianlong
    Su, Kaile
    PATTERN RECOGNITION, 2022, 130
  • [8] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [9] Zero-shot Handwritten Chinese Character Recognition with hierarchical decomposition embedding
    Cao, Zhong
    Lu, Jiang
    Cui, Sen
    Zhang, Changshui
    PATTERN RECOGNITION, 2020, 107
  • [10] Joint radical embedding and detection for zero-shot Chinese character recognition
    Luo, Guo-Feng
    Wang, Da-Han
    Zhang, Xu-Yao
    Lin, Zi-Hao
    Zhu, Shunzhi
    PATTERN RECOGNITION, 2025, 161