CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引：0

作者：

Cai, Hongyi ^{[1
]}

Zhu, Anna ^{[1
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年

关键词：

Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;

D O I：

10.1109/ICIP51287.2024.10647599

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.

引用

页码：2041 / 2047

页数：7

共 47 条

[41] ClusterE-ZSL: A Novel Cluster-Based Embedding for Enhanced Zero-Shot Learning in Contrastive Pre-Training Cross-Modal Retrieval
Tariq, Umair
Hu, Zonghai
Tasneem, Khawaja Tauseef
Bin Heyat, Md Belal
Iqbal, Muhammad Shahid
Aziz, Kamran
IEEE ACCESS, 2024, 12 : 162622 - 162637
[42] Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher Distillation
Li, Zhuoran
Hu, Chunming
Zhang, Richong
Chen, Junfan
Guo, Xiaohui
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4617 - 4630
[43] A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition
Zhang, Xiaoheng
Cui, Weigang
Hu, Bin
Li, Yang
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1553 - 1566
[44] Zero-shot face recognition: Improving the discriminability of visual face features using a Semantic-Guided Attention Model
Patricio, Cristiano
Neves, Joao C.
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
[45] LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition
Sarma, Sandipan
Singal, Divyam
Sur, Arijit
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
[46] MeshCLIP: Efficient cross-modal information processing for 3D mesh data in zero/few-shot learning
Song, Yupeng
Liang, Naifu
Guo, Qing
Dai, Jicheng
Bai, Junwei
He, Fazhi
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
[47] HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-Modal Context Interaction
Guo, Zhengrui
Ma, Jiabo
Xu, Yingxue
Wang, Yihui
Wang, Liansheng
Chen, Hao
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IV, 2024, 15004 : 189 - 199

← 1 2 3 4 5 →