MORAN: A Multi-Object Rectified Attention Network for scene text recognition

被引：318

作者：

Luo, Canjie ^{[1
]}

Jin, Lianwen ^{[1
,2
]}

Sun, Zenghui ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China

[2] SCUT Zhuhai Inst Modern Ind Innovat, Zhuhai, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 90卷

基金：

国家重点研发计划;

关键词：

Scene text recognition; Optical character recognition; Deep learning;

D O I：

10.1016/j.patcog.2019.01.020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Irregular text is widely used. However, it is considerably difficult to recognize because of its various shapes and distorted patterns. In this paper, we thus propose a multi-object rectified attention network (MORAN) for general scene text recognition. The MORAN consists of a multi-object rectification network and an attention-based sequence recognition network. The multi-object rectification network is designed for rectifying images that contain irregular text. It decreases the difficulty of recognition and enables the attention-based sequence recognition network to more easily read irregular text. It is trained in a weak supervision way, thus requiring only images and corresponding text labels. The attention-based sequence recognition network focuses on target characters and sequentially outputs the predictions. Moreover, to improve sensitivity of the attention-based sequence recognition network, a fractional pickup method is proposed for an attention-based decoder in the training phase. With the rectification mechanism, the MORAN can read both regular and irregular scene text. Extensive experiments on various benchmarks are conducted, which show that the MORAN achieves state-of-the-art performance. The source code is available.(1) (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：109 / 118

页数：10

共 57 条

[1] Word Spotting and Recognition with Embedded Attributes [J].

Almazan, Jon ;

Gordo, Albert ;

Fornes, Alicia ;

Valveny, Ernest .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) :2552-2566

[2]

[Anonymous], PROC CVPR IEEE

[3]

[Anonymous], 2014, P AS C COMP VIS

[4]

[Anonymous], P INT C LEARN REPR I

[5]

[Anonymous], 2012, MACH LEARN

[6]

[Anonymous], CORR

[7] PhotoOCR: Reading Text in Uncontrolled Conditions [J].

Bissacco, Alessandro ;

Cummins, Mark ;

Netzer, Yuval ;

Neven, Hartmut .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792

[8] PRINCIPAL WARPS - THIN-PLATE SPLINES AND THE DECOMPOSITION OF DEFORMATIONS [J].

BOOKSTEIN, FL .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (06) :567-585

[9] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].

Ch'ng, Chee Kheng ;

Chan, Chee Seng .

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942

[10] AON: Towards Arbitrarily-Oriented Text Recognition [J].

Cheng, Zhanzhan ;

Xu, Yangliu ;

Bai, Fan ;

Niu, Yi ;

Pu, Shiliang ;

Zhou, Shuigeng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5571-5579

← 1 2 3 4 5 6 →