ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

被引:588
作者
Shi, Baoguang [1 ]
Yang, Mingkun [1 ]
Wang, Xinggang [1 ]
Lyu, Pengyuan [1 ]
Yao, Cong [2 ]
Bai, Xiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
[2] Megvii Face Inc, Beijing 100190, Peoples R China
基金
国家重点研发计划;
关键词
Scene text recognition; thin-plate spline; image transformation; sequence-to-sequence learning; LOCALIZATION; SEQUENCE;
D O I
10.1109/TPAMI.2018.2848939
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.
引用
收藏
页码:2035 / 2048
页数:14
相关论文
共 72 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Word Spotting and Recognition with Embedded Attributes [J].
Almazan, Jon ;
Gordo, Albert ;
Fornes, Alicia ;
Valveny, Ernest .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) :2552-2566
[3]  
[Anonymous], P INT C LEARN REPR
[4]  
[Anonymous], 2005, P 1 INT WORKSH CAM B
[5]  
[Anonymous], PROC CVPR IEEE
[6]  
[Anonymous], P NIPS DEEP LEARN WO
[7]  
[Anonymous], IEEE T PATTERN ANAL
[8]  
[Anonymous], ADV NEURAL INFORM PR, DOI DOI 10.1109/TPAMI.2016.2577031
[9]  
[Anonymous], ADV NEURAL INFORM PR
[10]  
[Anonymous], 2014, 3 INT C LEARN REPR