Learning and Fusing Multi-Scale Representations for Accurate Arbitrary-Shaped Scene Text Recognition

被引:0
作者
Li, Mingjun [1 ]
Xu, Shuo [1 ]
Su, Feng [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023 | 2023年
关键词
scene text recognition; multi-scale; scale space; attention; Transformer; NETWORK;
D O I
10.1145/3591106.3592214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text in natural images carries a wealth of valuable semantic information, while due to the largely varied appearance of the text, accurately recognizing scene text is a challenging task. In this work, we propose an arbitrary-shaped scene text recognition method based on learning and fusing multiple representations of text in the scale space with attention mechanisms. Specifically, as distinctive visual features of text often appear at different scales, given an input text image, we generate a family of multi-scale representations that capture complementary appearance characteristics of the text through multiple encoder branches with progressively increasing scale parameters. We further introduce edge map features as a supplementary high-frequency representation with useful text cues. We then refine the multi-scale representations with in-scale and cross-scale attention mechanisms and adaptively aggregate them into an enhanced representation of the text, which effectively improves the text recognition accuracy. The proposed text recognition method achieves competitive results on several scene text benchmarks, demonstrating its effectiveness in recognizing text of various shapes.
引用
收藏
页码:353 / 361
页数:9
相关论文
共 45 条
  • [1] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
    Baek, Jeonghun
    Kim, Geewook
    Lee, Junyeop
    Park, Sungrae
    Han, Dongyoon
    Yun, Sangdoo
    Oh, Seong Joon
    Lee, Hwalsuk
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4714 - 4722
  • [2] Strokelets: A Learned Multi-Scale Mid-Level Representation for Scene Text Recognition
    Bai, Xiang
    Yao, Cong
    Liu, Wenyu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2789 - 2802
  • [3] Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
    Bhunia, Ayan Kumar
    Sain, Aneeshan
    Chowdhury, Pinaki Nath
    Song, Yi-Zhe
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 963 - 972
  • [4] PhotoOCR: Reading Text in Uncontrolled Conditions
    Bissacco, Alessandro
    Cummins, Mark
    Netzer, Yuval
    Neven, Hartmut
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 785 - 792
  • [5] Dynamic Convolution: Attention over Convolution Kernels
    Chen, Yinpeng
    Dai, Xiyang
    Liu, Mengchen
    Chen, Dongdong
    Yuan, Lu
    Liu, Zicheng
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 11027 - 11036
  • [6] AON: Towards Arbitrarily-Oriented Text Recognition
    Cheng, Zhanzhan
    Xu, Yangliu
    Bai, Fan
    Niu, Yi
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5571 - 5579
  • [7] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [8] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
    Fang, Shancheng
    Xie, Hongtao
    Wang, Yuxin
    Mao, Zhendong
    Zhang, Yongdong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7094 - 7103
  • [9] Fenfen Sheng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P781, DOI 10.1109/ICDAR.2019.00130
  • [10] Graves Alex, 2006, Proceedings of the 23rd international conference on Machine learning-ICML'06, P369, DOI DOI 10.1145/1143844.1143891