Recognizing text lines in handwritten archival document images using octave convolutional and attention recurrent neural networks

被引：0

作者：

Olfa Mechi ^{[1
]}

Maroua Mehri ^{[1
]}

Rolf Ingold ^{[2
]}

Najoua Essoukri Ben Amara ^{[1
]}

机构：

[1] Université de Sousse,

[2] Ecole Nationale d’Ingénieurs de Sousse,undefined

[3] LATIS-Laboratory of Advanced Technology and Intelligent Systems,undefined

[4] DIVA Group,undefined

[5] University of Fribourg,undefined

来源：

Multimedia Tools and Applications | 2025年 / 84卷 / 17期

关键词：

Handwritten archival document images; Text line recognition; Deep learning; Octave convolution; Attention mechanism;

D O I：

10.1007/s11042-024-19717-4

中图分类号：

学科分类号：

摘要：

Over the past several years, many archivists and historians have pointed out growing needs closely related to robust and efficient offline handwritten text recognition systems able to assist them in transcribing handwritten archival documents. These systems are able to generate as output an editable text file corresponding to the transcription of the digitized documents. Recently, great attention has been paid to the use of deep learning for solving various issues related to document image analysis. Particularly, deep learning architectures have been extensively used for handwritten text recognition to overcome some limitations of many old-fashion approaches, such as the hidden Markov models. To contribute to this trend, we propose in this article a deep learning architecture able to recognize text lines in handwritten archival document images using octave-based convolutional and attention-based recurrent neural networks. The proposed architecture is composed of the encoder and decoder blocks. First, the octave convolutional layers are used in the encoder block. Then, the attention-based bidirectional long short-term memory network, followed by the connectionist temporal classification layer are used in the decoder block. A set of experiments was carried out to show the effectiveness of the proposed architecture using different benchmark datasets of historical handwritten document images. Qualitative and quantitative results were reported and compared with those of recent state-of-the-art ones and the participating methods in the ICDAR and ICFHR contests in the same conditions (i.e. without using language model nor lexicon constraint). Using the proposed architecture, low character error rates at line level are achieved on three different datasets: 6.02%, 4.30% and 6.9% for the IAM, Rimes and Bentham datasets, respectively.

引用

页码：18095 / 18122

页数：27