End-to-End Handwritten Paragraph Text Recognition Using a Vertical Attention Network

被引:55
作者
Coquenet, Denis [1 ,2 ,3 ]
Chatelain, Clement [3 ,4 ]
Paquet, Thierry [1 ,2 ,3 ]
机构
[1] LITIS EA 4108, F-76800 Saint Etienne Du Rouvray, France
[2] Univ Rouen Normandy, F-76000 Rouen, France
[3] Normandy Univ, F-14032 Caen, France
[4] INSA Rouen Normandy, F-76800 Saint Etienne Du Rouvray, France
关键词
Seq2Seq model; hybrid attention; segmentation-free; paragraph handwriting recognition; fully convolutional network; encoder-decoder; optical character recognition; LINE SEGMENTATION; MARKOV-MODELS; HYBRID;
D O I
10.1109/TPAMI.2022.3144899
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.
引用
收藏
页码:508 / 524
页数:17
相关论文
共 47 条
[1]  
Sanchez JA, 2016, INT CONF FRONT HAND, P630, DOI [10.1109/ICFHR.2016.112, 10.1109/ICFHR.2016.0120]
[2]  
[Anonymous], 2014, 3 INT C LEARN REPR
[3]   Line Segmentation Free Probabilistic Keyword Spotting and Indexing [J].
Barrere, Killian ;
Toselli, Alejandro H. ;
Vidal, Enrique .
PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II, 2019, 11868 :201-213
[4]   LEREC - A NN/HMM HYBRID FOR ONLINE HANDWRITING RECOGNITION [J].
BENGIO, Y ;
LECUN, Y ;
NOHL, C ;
BURGES, C .
NEURAL COMPUTATION, 1995, 7 (06) :1289-1303
[5]  
Bluche T, 2016, ADV NEUR IN, V29
[6]   Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention [J].
Bluche, Theodore ;
Louradour, Jerome ;
Messina, Ronaldo .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1050-1055
[7]  
Bluche T, 2013, INT CONF ACOUST SPEE, P2390, DOI 10.1109/ICASSP.2013.6638083
[8]   A neural model for text localization, transcription and named entity recognition in full pages [J].
Carbonell, Manuel ;
Fornes, Alicia ;
Villegas, Mauricio ;
Llados, Josep .
PATTERN RECOGNITION LETTERS, 2020, 136 :219-227
[9]   End-to-End Handwritten Text Detection and Transcription in Full Pages [J].
Carbonell, Manuel ;
Mas, Joan ;
Villegas, Mauricio ;
Fornes, Alicia ;
Llados, Josep .
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, :29-34
[10]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807