Class-Aware Mask-guided feature refinement for scene text recognition

被引:6
|
作者
Yang, Mingkun [1 ]
Yang, Biao [2 ]
Liao, Minghui [5 ]
Zhu, Yingying [1 ,3 ]
Bai, Xiang [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artifcial Intelligence & Automat, Wuhan, Peoples R China
[3] Huazhong Univ Sci & Technol, Hubei Key Lab Smart Internet Technol, Wuhan, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
[5] Huawei Inc, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Text recognition; Text segmentation; Multi-task learning; Feature fusion; ATTENTION NETWORK;
D O I
10.1016/j.patcog.2023.110244
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition is a rapidly developing field that faces numerous challenges due to the complexity and diversity of scene text, including complex backgrounds, diverse fonts, flexible arrangements, and accidental occlusions. In this paper, we propose a novel approach called Class -Aware Mask -guided feature refinement (CAM) to address these challenges. Our approach introduces canonical class -aware glyph masks generated from a standard font to effectively suppress background and text style noise, thereby enhancing feature discrimination. Additionally, we design a feature alignment and fusion module to incorporate the canonical mask guidance for further feature refinement for text recognition. By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance. We first evaluate CAM on six standard text recognition benchmarks to demonstrate its effectiveness. Furthermore, CAM exhibits superiority over the state-of-the-art method by an average performance gain of 4.1% across six more challenging datasets, despite utilizing a smaller model size. Our study highlights the importance of incorporating canonical mask guidance and aligned feature refinement techniques for robust scene text recognition. Code will be available at https://github.com/MelosY/CAM.
引用
收藏
页数:14
相关论文
共 13 条
  • [1] Part-Aware Mask-Guided Attention for Thorax Disease Classification
    Zhang, Ruihua
    Yang, Fan
    Luo, Yan
    Liu, Jianyi
    Li, Jinbin
    Wang, Cong
    ENTROPY, 2021, 23 (06)
  • [2] Instruction-Guided Scene Text Recognition
    Du, Yongkun
    Chen, Zhineng
    Su, Yuchen
    Jia, Caiyan
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2723 - 2738
  • [3] LAL: Linguistically Aware Learning for Scene Text Recognition
    Zheng, Yi
    Qin, Wenda
    Wijaya, Derry
    Betke, Margrit
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4051 - 4059
  • [4] Synthetically Supervised Feature Learning for Scene Text Recognition
    Liu, Yang
    Wang, Zhaowen
    Jin, Hailin
    Wassell, Ian
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 449 - 465
  • [5] Augmented Scene Text Recognition Using Crosswise Feature Extraction
    Kiliroor, Cinu C.
    Shrija, S.
    Ajay, R.
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 123 (01) : 421 - 436
  • [6] Augmented Scene Text Recognition Using Crosswise Feature Extraction
    Cinu C Kiliroor
    S. Shrija
    R. Ajay
    Wireless Personal Communications, 2022, 123 : 421 - 436
  • [7] Noisy-Aware Unsupervised Domain Adaptation for Scene Text Recognition
    Liu, Xiao-Qian
    Zhang, Peng-Fei
    Luo, Xin
    Huang, Zi
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6550 - 6563
  • [8] HAFE: A Hierarchical Awareness and Feature Enhancement Network for Scene Text Recognition
    He, Kai
    Tang, Jinlong
    Liu, Zikang
    Yang, Ziqi
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [9] Morphological Feature Aware Multi-CNN Model for Multilingual Text Recognition
    Zhou, Yujie
    Liu, Jin
    Xie, Yurong
    Wang, Y. Ken
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 30 (02) : 715 - 733
  • [10] Enhanced Chinese scene text recognition model base on cross-domain feature fusion
    Ran Cui
    Aichun Zhu
    Zichen Ding
    Signal, Image and Video Processing, 2025, 19 (6)