MORAN: A Multi-Object Rectified Attention Network for scene text recognition

被引：318

作者：

Luo, Canjie ^{[1
]}

Jin, Lianwen ^{[1
,2
]}

Sun, Zenghui ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China

[2] SCUT Zhuhai Inst Modern Ind Innovat, Zhuhai, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 90卷

基金：

国家重点研发计划;

关键词：

Scene text recognition; Optical character recognition; Deep learning;

D O I：

10.1016/j.patcog.2019.01.020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Irregular text is widely used. However, it is considerably difficult to recognize because of its various shapes and distorted patterns. In this paper, we thus propose a multi-object rectified attention network (MORAN) for general scene text recognition. The MORAN consists of a multi-object rectification network and an attention-based sequence recognition network. The multi-object rectification network is designed for rectifying images that contain irregular text. It decreases the difficulty of recognition and enables the attention-based sequence recognition network to more easily read irregular text. It is trained in a weak supervision way, thus requiring only images and corresponding text labels. The attention-based sequence recognition network focuses on target characters and sequentially outputs the predictions. Moreover, to improve sensitivity of the attention-based sequence recognition network, a fractional pickup method is proposed for an attention-based decoder in the training phase. With the rectification mechanism, the MORAN can read both regular and irregular scene text. Extensive experiments on various benchmarks are conducted, which show that the MORAN achieves state-of-the-art performance. The source code is available.(1) (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：109 / 118

页数：10

共 57 条

[51]

Wang T, 2012, INT C PATT RECOG, P3304

[52]

Yang X, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3280

[53] Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [J].

Yao, Cong ;

Bai, Xiang ;

Shi, Baoguang ;

Liu, Wenyu .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :4042-4049

[54] Text Detection and Recognition in Imagery: A Survey [J].

Ye, Qixiang ;

Doermann, David .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (07) :1480-1500

[55]

Yin Fei, 2017, ARXIV PREPRINT ARXIV

[56] Could scene context be beneficial for scene text detection? [J].

Zhu, Anna ;

Gao, Renwu ;

Uchida, Seiichi .

PATTERN RECOGNITION, 2016, 58 :204-215

[57] Scene text detection and recognition: recent advances and future trends [J].

Zhu, Yingying ;

Yao, Cong ;

Bai, Xiang .

FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (01) :19-36

← 1 2 3 4 5 6 →