CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引:0
|
作者
Cai, Hongyi [1 ]
Zhu, Anna [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年
关键词
Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;
D O I
10.1109/ICIP51287.2024.10647599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.
引用
收藏
页码:2041 / 2047
页数:7
相关论文
共 47 条
  • [41] ClusterE-ZSL: A Novel Cluster-Based Embedding for Enhanced Zero-Shot Learning in Contrastive Pre-Training Cross-Modal Retrieval
    Tariq, Umair
    Hu, Zonghai
    Tasneem, Khawaja Tauseef
    Bin Heyat, Md Belal
    Iqbal, Muhammad Shahid
    Aziz, Kamran
    IEEE ACCESS, 2024, 12 : 162622 - 162637
  • [42] Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher Distillation
    Li, Zhuoran
    Hu, Chunming
    Zhang, Richong
    Chen, Junfan
    Guo, Xiaohui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4617 - 4630
  • [43] A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition
    Zhang, Xiaoheng
    Cui, Weigang
    Hu, Bin
    Li, Yang
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1553 - 1566
  • [44] Zero-shot face recognition: Improving the discriminability of visual face features using a Semantic-Guided Attention Model
    Patricio, Cristiano
    Neves, Joao C.
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [45] LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition
    Sarma, Sandipan
    Singal, Divyam
    Sur, Arijit
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [46] MeshCLIP: Efficient cross-modal information processing for 3D mesh data in zero/few-shot learning
    Song, Yupeng
    Liang, Naifu
    Guo, Qing
    Dai, Jicheng
    Bai, Junwei
    He, Fazhi
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [47] HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-Modal Context Interaction
    Guo, Zhengrui
    Ma, Jiabo
    Xu, Yingxue
    Wang, Yihui
    Wang, Liansheng
    Chen, Hao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IV, 2024, 15004 : 189 - 199