CROSS-MODAL ALIGNMENT OF LOCAL AND GLOBAL FEATURES FOR ZERO-SHOT CHINESE CHARACTER RECOGNITION

被引：0

作者：

Cai, Hongyi ^{[1
]}

Zhu, Anna ^{[1
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2024年

关键词：

Chinese character recognition; Zero-shot learning; Cross-modal alignment; Local and global feature;

D O I：

10.1109/ICIP51287.2024.10647599

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Chinese character recognition (CCR) is a pivotal domain in computer vision due to its complexity and diverse applications, especially given the extensive character categories posing challenges in identifying unseen characters. Addressing the zero-shot hurdle, we propose a CLIP-style model, which independently extracts features from aligned Chinese character images and Ideographic Description Sequences (IDS), achieving cross-modal alignment. Our approach encompasses local and global feature alignment. Initially, we introduce learnable discrete tokens to represent shared embeddings for visual and textual modalities, capturing the local context of Chinese characters. Then, encoding each radical extracts local features, mapped to shared discrete tokens via attention mechanisms. Additionally, encoding the entire character obtains global features. Training utilizes contrastive loss to facilitate cross-modal alignment. Experimental results confirm our method's superiority over conventional approaches, demonstrating remarkable performance on zero-shot Chinese character recognition benchmarks.

引用

页码：2041 / 2047

页数：7

共 47 条

[21] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
Deng, Cheng
Xu, Xinxun
Wang, Hao
Yang, Muli
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
[22] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
Tian J.-L.
Xu X.
Shen F.-M.
Shen H.-T.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):
[23] Integrating Adversarial Generative Network with Variational Autoencoders towards Cross-Modal Alignment for Zero-Shot Remote Sensing Image Scene Classification
Ma, Suqiang
Liu, Chun
Li, Zheng
Yang, Wei
REMOTE SENSING, 2022, 14 (18)
[24] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
Jiao, Shichao
Han, Xie
Xiong, Fengguang
Yang, Xiaowen
Han, Huiyan
He, Ligang
Kuang, Liqun
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16) : 13469 - 13483
[25] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
Shichao Jiao
Xie Han
Fengguang Xiong
Xiaowen Yang
Huiyan Han
Ligang He
Liqun Kuang
Neural Computing and Applications, 2022, 34 : 13469 - 13483
[26] M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning
Zhao, Peng
Wang, Qiangchang
Yin, Yilong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3161 - 3171
[27] Global Semantic Descriptors for Zero-Shot Action Recognition
Estevam, Valter
Laroca, Rayson
Pedrini, Helio
Menotti, David
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1843 - 1847
[28] Multi-modal zero-shot dynamic hand gesture recognition
Rastgoo, Razieh
Kiani, Kourosh
Escalera, Sergio
Sabokrou, Mohammad
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
[29] Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Jiang, Huajie
Wang, Ruiping
Shan, Shiguang
Chen, Xilin
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 121 - 138
[30] Zero-shot learning based on the fusion of global and local representations
Qiang, Wang
Mou, HongJin
Jia, Wang
Wei, Chunxiao
Yu, Zhou
MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (03)

← 1 2 3 4 5 →