ResneSt-Transformer: Joint attention segmentation-free for end-to-end handwriting paragraph recognition model

被引：3

作者：

Hamdan, Mohammed ^{[1
]}

Cheriet, Mohamed ^{[1
]}

机构：

[1] Univ Quebec ETS, Synchromedia Lab, Syst Engn, 1100 Notre Dame St W, Montreal, PQ H3C 1K3, Canada

来源：

ARRAY | 2023年 / 19卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Handwritten text recognition; ResneSt; Transformer; Self attention; Segmentation-free; Lexicon-free; Paragraph transcription; OCR; Encoder-decoder; Image-seq; TEXT; IMAGE;

D O I：

10.1016/j.array.2023.100300

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Offline handwritten text recognition (HTR) typically relies on segmented text-line images for training and transcription. However, acquiring line-level position and transcript information can be challenging and time-consuming, while automatic line segmentation algorithms are prone to errors that impede the recognition phase. To address these issues, we introduce a state-of-the-art solution that integrates vision and language models using efficient split and multi-head attention neural networks, referred to as joint attention (ResneSt-Transformer), for end-to-end recognition of handwritten paragraphs. Our proposed novel one-stage, segmentation-free pipeline employs joint attention mechanisms to process paragraph images in an end-to-end trainable manner. This pipeline comprises three modules, with the output of one serving as the input for the next. Initially, a feature extraction module employing a CNN with a split attention mechanism (ResneSt50) is utilized. Subsequently, we develop an encoder module containing four transformer layers to generate robust representations of the entire paragraph image. Lastly, we designed a decoder module with six transformer layers to construct weighted masks. The encoder and decoder modules incorporate a multi-head self-attention mechanism and positional encoding, enabling the model to concentrate on specific feature maps at the current time step. By leveraging joint attention and a segmentation-free approach, our neural network calculates split attention weights on the visual representation, facilitating implicit line segmentation. This strategy signifies a substantial advancement toward achieving end-to-end transcription of entire paragraphs. Experiments conducted on paragraph-level benchmark datasets, including RIMES, IAM, and READ 2016 test datasets, demonstrate competitive results compared to recent paragraph-level models while maintaining reduced complexity. The code and pre-trained models are available on our GitHub repository here: HTTPSlink.

引用

页数：12

共 64 条

[1] Developing an Algorithm for Sequential Sorting of Discrete and Connected Characters Using Image Processing of Multi-Line License Plates [J].

Ahmed, Abu Jar Md. Minhuz Uddin ;

Uddin, Md. Anwar ;

Rahman, Md. Asadur .

ARRAY, 2021, 10

[2]

Sanchez JA, 2016, INT CONF FRONT HAND, P630, DOI [10.1109/ICFHR.2016.112, 10.1109/ICFHR.2016.0120]

[3]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[4]

Bali A, 2022, INT J DOC ANAL RECOG, P1, DOI DOI 10.1007/s10032-022-00422-7

[5]

Bartz C, 2020, HANDWRITING DETERMIN, P1

[6] Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition [J].

Bianne-Bernard, Anne-Laure ;

Menasri, Fares ;

Mohamad, Rami Al-Hajj ;

Mokbel, Chafic ;

Kermorvant, Christopher ;

Likforman-Sulem, Laurence .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (10) :2066-2080

[7]

Bluche T, 2016, ADV NEUR IN, V29

[8] Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition [J].

Bluche, Theodore ;

Messina, Ronaldo .

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :646-651

[9] Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention [J].

Bluche, Theodore ;

Louradour, Jerome ;

Messina, Ronaldo .

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1050-1055

[10] A neural model for text localization, transcription and named entity recognition in full pages [J].

Carbonell, Manuel ;

Fornes, Alicia ;

Villegas, Mauricio ;

Llados, Josep .

PATTERN RECOGNITION LETTERS, 2020, 136 :219-227

← 1 2 3 4 5 6 7 →