Enhanced Chinese scene text recognition model base on cross-domain feature fusion

被引:0
作者
Ran Cui [1 ]
Aichun Zhu [2 ]
Zichen Ding [1 ]
机构
[1] Xuhai College, China University of Mining and Technology, Jiangsu, Xuzhou
[2] Computer and Information Engineering, Nanjing University of Technology, Jiangsu, Nanjing
基金
中国国家自然科学基金;
关键词
Chinese scene; Cross-domain; Deep learning; Text recognition;
D O I
10.1007/s11760-025-04058-y
中图分类号
学科分类号
摘要
This paper introduces an Enhanced Chinese Scene Text Recognition (CSTR) Model leveraging Cross-Domain Feature Fusion to address the intricate challenges in CSTR, encompassing text deformation, vertical layout, complex structures, and an abundance of similar-looking characters. By incorporating frequency domain analysis and a detail enhancement mechanism, the model disentangles content and directional features in the frequency domain during visual feature extraction, markedly reducing confusion in recognizing similar-looking characters. The model’s architecture seamlessly integrates a ResNet encoder with a Transformer decoder, fusing spatial domain, frequency domain, and detailed features to bolster recognition performance. Comprehensive evaluations reveal that this model surpasses existing approaches, demonstrating superior accuracy and robustness in CSTR tasks, thereby advancing the state-of-the-art in Chinese scene text recognition. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
引用
收藏
相关论文
共 49 条
  • [1] Guo Y., Yu H., Ma L., Zeng L., Luo X., THFE: a triple-hierarchy feature enhancement method for tiny boat detection, Eng. Appl. Artif. Intell, 123, pp. 1-16, (2023)
  • [2] Guo Y., Yu H., Xie S., Ma L., Cao X., Luo X., DSCA: a dual semantic correlation alignment method for domain adaptation object detection, Pattern Recognit, 150, pp. 1-13, (2024)
  • [3] Guo Y., Ma L., Luo X., Xie S., DP-DDCL: a discriminative prototype with dual decoupled contrast learning method for few-shot object detection, Knowl. Based Syst, 297, pp. 1-15, (2024)
  • [4] Guo Y., Yu H., Ma L., Luo X., Xie S., DIE-CDK: a discriminative information enhancement method with cross-modal domain knowledge for fine-grained ship detection, IEEE Trans. Circuits Syst. Video Technol, 34, 11, pp. 10646-10661, (2024)
  • [5] Sermanet P., Lecun Y., Traffic sign recognition with multi-scale convolutional networks, Proceedings of the International Joint Conference on Neural Networks, pp. 2809-2813, (2011)
  • [6] Wang Z., Liu X., Li H., Sheng L., Yan J., Wang X., Shao J., Camp: Cross-modal adaptive message passing for text-image retrieval, Proceedings of the IEEE International Conference on Computer Vision, pp. 5763-5772, (2019)
  • [7] Yu H., Chen J., Li B., Ma J., Guan M., Xu X., Wang X., Qu S., Xue X., Benchmarking Chinese text recognition: datasets, baselines, and an empirical study. arXiv, (2021)
  • [8] Yin F., Wu, Y.-C., Zhang, X.-Y., Liu, C.-L.: Scene text recognition with sliding convolutional character models, (2017)
  • [9] Liu W., Chen C., Wong K.-Y., Char-net: a character-aware neural network for distorted scene text recognition, Proceedings of the AAAI Conference on Artificial Intelligence, (2018)
  • [10] Shi B., Bai X., Yao C., An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE Trans. Pattern Anal. Mach. Intell, 39, 11, pp. 2298-2304, (2016)