Learning to detect, localize and recognize many text objects in document images from few examples

被引:7
作者
Moysset, Bastien [1 ,2 ]
Kermorvant, Christopher [3 ]
Wolf, Christian [2 ,4 ]
机构
[1] A2iA SA, Paris, France
[2] INSA Lyon, LIRIS, UMR 5205, F-69621 Villeurbanne, France
[3] Teklia SAS, Paris, France
[4] Univ Lyon, CNRS, Lyon, France
关键词
Text line detection; Neural network; Recurrent; Regression; Local; Document analysis; LINE SEGMENTATION;
D O I
10.1007/s10032-018-0305-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current trend in object detection and localization is to learn predictions with high capacity deep neural networks trained on a very large amount of annotated data and using a high amount of processing power. In this work, we particularly target the detection of text in document images and we propose a new neural model which directly predicts object coordinates. The particularity of our contribution lies in the local computations of predictions with a new form of local parameter sharing which keeps the overall amount of trainable parameters low. Key components of the model are spatial 2D-LSTM recurrent layers which convey contextual information between the regions of the image. We show that this model is more powerful than the state of the art in applications where training data are not as abundant as in the classical configuration of natural images and Imagenet/Pascal-VOC tasks. The proposed model also facilitates the detection of many objects in a single image and can deal with inputs of variable sizes without resizing. To enhance the localization precision of the coordinate regressor, we limit the amount of information produced by the local model components and propose two different regression strategies: (i) separately predict lower-left and upper-right corners of each object bounding box, followed by combinatorial pairing; (ii) only predict the left side of the objects and estimate the right position jointly with text recognition. These strategies lead to good full-page text recognition results in heterogeneous documents. Experiments have been performed on a document analysis task, the localization of the text lines in the Maurdor dataset.
引用
收藏
页码:161 / 175
页数:15
相关论文
共 48 条
  • [1] [Anonymous], 2007, INT C NEUR INF PROC
  • [2] [Anonymous], INT C FRONT HANDWR R
  • [3] [Anonymous], WORKSH HIST DOC IM P
  • [4] [Anonymous], 2015, ARXIV14121441
  • [5] Face localization and tracking in the neural abstraction pyramid
    Behnke, S
    [J]. NEURAL COMPUTING & APPLICATIONS, 2005, 14 (02) : 97 - 103
  • [6] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
    Bell, Sean
    Zitnick, C. Lawrence
    Bala, Kavita
    Girshick, Ross
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2874 - 2883
  • [7] Bluche T., 2016, Advances in Neural Information Processing Systems (NIPS)
  • [8] Brunessaux S., 2014, DOCUMENT ANAL SYSTEM
  • [9] Convolutional Neural Networks for Page Segmentation of Historical Document Images
    Chen, Kai
    Seuret, Mathias
    Henneberet, Jean
    Ingold, Rolf
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 965 - 970
  • [10] Dai J., 2016, ADV NEURAL INFORM PR, V29, P379, DOI [DOI 10.1016/J.JPOWSOUR.2007.02.075, DOI 10.48550/ARXIV.1605.06409, DOI 10.1109/CVPR.2017.690]