Multi-orientation scene text detection with scale-guided regression

被引：6

作者：

Liang, Min ^{[1
]}

Hou, Jie-Bo ^{[1
]}

Zhu, Xiaobin ^{[1
]}

Yang, Chun ^{[1
]}

Qin, Jingyan ^{[2
]}

Yin, Xu-Cheng ^{[1
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing 100083, Peoples R China

来源：

NEUROCOMPUTING | 2021年 / 461卷

基金：

中国国家自然科学基金;

关键词：

Scene text detection; Classification; Regression;

D O I：

10.1016/j.neucom.2021.07.026

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing multi-orientation scene text detection methods generally contain two crucial components: regression prediction for text bounding boxes and classification prediction for text/non-text. However, these methods always regard classification prediction and regression prediction as two independent procedures, neglecting fully exploring their mutual relations. Based on this key observation, we propose an innovative Scale-Guided Regression Module (SRM), specially for multi-orientation scene text detection. Equipped with width-guided kernels and height-guided kernels of different sizes, our SRM can generate a series of scale feature maps of candidate texts by capturing their shape information in classification prediction. The scale feature maps are used to predict the width and height of candidate texts, which can serve as guides for regressing bounding boxes. In this way, the procedures of classification and regression can be coherently integrated. In addition, we adopt IoU loss to train our network and then integrate IoU loss and l(1)-smooth loss for fine-tuning. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. Notably, our method achieves significant improvement of performance on long texts, e.g., on MSRA-TD500, our method outperforms Basemodel with a great margin (4.86% in terms of Recall). (C) 2021 Elsevier B.V. All rights reserved.

引用

页码：310 / 318

页数：9

共 43 条

[11] SemiText: Scene text detection with semi-supervised learning
Liu, Juhua
Zhong, Qihuang
Yuan, Yuan
Su, Hai
Du, Bo
[J]. NEUROCOMPUTING, 2020, 407 : 343 - 353
[12] SSD: Single Shot MultiBox Detector
Liu, Wei
Anguelov, Dragomir
Erhan, Dumitru
Szegedy, Christian
Reed, Scott
Fu, Cheng-Yang
Berg, Alexander C.
[J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 21 - 37
[13] FOTS: Fast Oriented Text Spotting with a Unified Network
Liu, Xuebo
Liang, Ding
Yan, Shi
Chen, Dagui
Qiao, Yu
Yan, Junjie
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5676 - 5685
[14] Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
[15] TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
Long, Shangbang
Ruan, Jiaqiang
Zhang, Wenjie
He, Xin
Wu, Wenhao
Yao, Cong
[J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 19 - 35
[16] Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
Lyu, Pengyuan
Yao, Cong
Wu, Wenhao
Yan, Shuicheng
Bai, Xiang
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7553 - 7563
[17] Arbitrary-Oriented Scene Text Detection via Rotation Proposals
Ma, Jianqi
Shao, Weiyuan
Ye, Hao
Wang, Li
Wang, Hong
Zheng, Yingbin
Xue, Xiangyang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (11) : 3111 - 3122
[18] ICDAR2017 Robust Reading Challenge on Multi-lingual Scene Text Detection and Script Identification - RRC-MLT
Nayef, Nibal
Yin, Fei
Bizid, Imen
Choi, Hyunsoo
Feng, Yuan
Karatzas, Dimosthenis
Luo, Zhenbo
Pal, Umapada
Rigaud, Christophe
Chazalon, Joseph
Khlif, Wafa
Luqman, Muhammad Muzzamil
Burie, Jean-Christophe
Liu, Cheng-Lin
Ogier, Jean-Marc
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1454 - 1459
[19] Redmon J., 2016, P IEEE C COMP VIS PA, P779, DOI DOI 10.1109/CVPR.2016.91
[20] Redmon J, 2018, Arxiv, DOI arXiv:1804.02767

← 1 2 3 4 5 →