Multi-orientation scene text detection with scale-guided regression

被引:6
作者
Liang, Min [1 ]
Hou, Jie-Bo [1 ]
Zhu, Xiaobin [1 ]
Yang, Chun [1 ]
Qin, Jingyan [2 ]
Yin, Xu-Cheng [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; Classification; Regression;
D O I
10.1016/j.neucom.2021.07.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing multi-orientation scene text detection methods generally contain two crucial components: regression prediction for text bounding boxes and classification prediction for text/non-text. However, these methods always regard classification prediction and regression prediction as two independent procedures, neglecting fully exploring their mutual relations. Based on this key observation, we propose an innovative Scale-Guided Regression Module (SRM), specially for multi-orientation scene text detection. Equipped with width-guided kernels and height-guided kernels of different sizes, our SRM can generate a series of scale feature maps of candidate texts by capturing their shape information in classification prediction. The scale feature maps are used to predict the width and height of candidate texts, which can serve as guides for regressing bounding boxes. In this way, the procedures of classification and regression can be coherently integrated. In addition, we adopt IoU loss to train our network and then integrate IoU loss and l(1)-smooth loss for fine-tuning. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. Notably, our method achieves significant improvement of performance on long texts, e.g., on MSRA-TD500, our method outperforms Basemodel with a great margin (4.86% in terms of Recall). (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:310 / 318
页数:9
相关论文
共 43 条
  • [11] SemiText: Scene text detection with semi-supervised learning
    Liu, Juhua
    Zhong, Qihuang
    Yuan, Yuan
    Su, Hai
    Du, Bo
    [J]. NEUROCOMPUTING, 2020, 407 : 343 - 353
  • [12] SSD: Single Shot MultiBox Detector
    Liu, Wei
    Anguelov, Dragomir
    Erhan, Dumitru
    Szegedy, Christian
    Reed, Scott
    Fu, Cheng-Yang
    Berg, Alexander C.
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 21 - 37
  • [13] FOTS: Fast Oriented Text Spotting with a Unified Network
    Liu, Xuebo
    Liang, Ding
    Yan, Shi
    Chen, Dagui
    Qiao, Yu
    Yan, Junjie
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5676 - 5685
  • [14] Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
  • [15] TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
    Long, Shangbang
    Ruan, Jiaqiang
    Zhang, Wenjie
    He, Xin
    Wu, Wenhao
    Yao, Cong
    [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 19 - 35
  • [16] Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
    Lyu, Pengyuan
    Yao, Cong
    Wu, Wenhao
    Yan, Shuicheng
    Bai, Xiang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7553 - 7563
  • [17] Arbitrary-Oriented Scene Text Detection via Rotation Proposals
    Ma, Jianqi
    Shao, Weiyuan
    Ye, Hao
    Wang, Li
    Wang, Hong
    Zheng, Yingbin
    Xue, Xiangyang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (11) : 3111 - 3122
  • [18] ICDAR2017 Robust Reading Challenge on Multi-lingual Scene Text Detection and Script Identification - RRC-MLT
    Nayef, Nibal
    Yin, Fei
    Bizid, Imen
    Choi, Hyunsoo
    Feng, Yuan
    Karatzas, Dimosthenis
    Luo, Zhenbo
    Pal, Umapada
    Rigaud, Christophe
    Chazalon, Joseph
    Khlif, Wafa
    Luqman, Muhammad Muzzamil
    Burie, Jean-Christophe
    Liu, Cheng-Lin
    Ogier, Jean-Marc
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1454 - 1459
  • [19] Redmon J., 2016, P IEEE C COMP VIS PA, P779, DOI DOI 10.1109/CVPR.2016.91
  • [20] Redmon J, 2018, Arxiv, DOI arXiv:1804.02767