Multi-orientation scene text detection with scale-guided regression

被引:6
作者
Liang, Min [1 ]
Hou, Jie-Bo [1 ]
Zhu, Xiaobin [1 ]
Yang, Chun [1 ]
Qin, Jingyan [2 ]
Yin, Xu-Cheng [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; Classification; Regression;
D O I
10.1016/j.neucom.2021.07.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing multi-orientation scene text detection methods generally contain two crucial components: regression prediction for text bounding boxes and classification prediction for text/non-text. However, these methods always regard classification prediction and regression prediction as two independent procedures, neglecting fully exploring their mutual relations. Based on this key observation, we propose an innovative Scale-Guided Regression Module (SRM), specially for multi-orientation scene text detection. Equipped with width-guided kernels and height-guided kernels of different sizes, our SRM can generate a series of scale feature maps of candidate texts by capturing their shape information in classification prediction. The scale feature maps are used to predict the width and height of candidate texts, which can serve as guides for regressing bounding boxes. In this way, the procedures of classification and regression can be coherently integrated. In addition, we adopt IoU loss to train our network and then integrate IoU loss and l(1)-smooth loss for fine-tuning. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. Notably, our method achieves significant improvement of performance on long texts, e.g., on MSRA-TD500, our method outperforms Basemodel with a great margin (4.86% in terms of Recall). (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:310 / 318
页数:9
相关论文
共 43 条
  • [1] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
  • [2] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [3] Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression
    He, Wenhao
    Zhang, Xu-Yao
    Yin, Fei
    Liu, Cheng-Lin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5406 - 5419
  • [4] Deep Direct Regression for Multi-Oriented Scene Text Detection
    He, Wenhao
    Zhang, Xu-Yao
    Yin, Fei
    Liu, Cheng-Lin
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 745 - 753
  • [5] Hou J., 2021, IEEE T INTELL TRANSP, P1
  • [6] Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942
  • [7] Liao MH, 2020, AAAI CONF ARTIF INTE, V34, P11474
  • [8] TextBoxes plus plus : A Single-Shot Oriented Scene Text Detector
    Liao, Minghui
    Shi, Baoguang
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3676 - 3690
  • [9] Liao MH, 2017, AAAI CONF ARTIF INTE, P4161
  • [10] Feature Pyramid Networks for Object Detection
    Lin, Tsung-Yi
    Dollar, Piotr
    Girshick, Ross
    He, Kaiming
    Hariharan, Bharath
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944