Handwritten text recognition and information extraction from ancient manuscripts using deep convolutional and recurrent neural network

被引:0
作者
El Bahi, Hassan [1 ]
机构
[1] L2IS, Laboratory of Computer and Systems Engineering, Cadi Ayyad University, B.P. 511, Marrakech
关键词
Ancient manuscripts; Convolutional neural network; Handwritten text recognition; Named entity recognition; Recurrent neural network;
D O I
10.1007/s00500-024-09930-6
中图分类号
学科分类号
摘要
Digitizing ancient manuscripts and making them accessible to a broader audience is a crucial step in unlocking the wealth of information they hold. However, automatic recognition of handwritten text and the extraction of relevant information such as named entities from these manuscripts are among the most difficult research topics, due to several factors such as poor quality of manuscripts, complex background, presence of ink stains, cursive handwriting, etc. To meet these challenges, we propose two systems, the first system performs the task of handwritten text recognition (HTR) in ancient manuscripts; it starts with a preprocessing operation. Then, a convolutional neural network (CNN) is used to extract the features of each input image. Finally, a recurrent neural network (RNN) which has Long Short-Term Memory (LSTM) blocks with the Connectionist Temporal Classification (CTC) layer will predict the text contained in the image. The second system focuses on recognizing named entities and deciphering the relationships among words directly from images of old manuscripts, bypassing the need for an intermediate text transcription step. Like the previous system, this second system starts with a preprocessing step. Then the data augmentation technique is used to increase the training dataset. After that, the extraction of the most relevant features is done automatically using a CNN model. Finally, the recognition of names entities and the relationship between word images is performed using a bidirectional LSTM. Extensive experiments on the ESPOSALLES dataset demonstrate that the proposed systems achieve the state-of-the-art performance exceeding existing systems. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
页码:12249 / 12268
页数:19
相关论文
共 50 条
  • [31] Recognition of Arabic Handwritten Literal Amounts Using Deep Convolutional Neural Networks
    El-Melegy, Moumen
    Abdelbaset, Asmaa
    Abdel-Hakim, Alaa
    El-Sayed, Gamal
    PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II, 2019, 11868 : 169 - 176
  • [32] Performance Comparison of Text-based Sentiment Analysis using Recurrent Neural Network and Convolutional Neural Network
    Purnamasari, Prima Dewi
    Taqiyuddin, Muhammad
    Ratna, Anak Agung Putri
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING (ICCIP 2017), 2017, : 19 - 23
  • [33] Bangla Handwritten Character Recognition using Convolutional Neural Network with Data Augmentation
    Chowdhury, Rumman Rashid
    Hossain, Mohammad Shahadat
    Ul Islam, Raihan
    Andersson, Karl
    Hossain, Sazzad
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 318 - 323
  • [34] Handwritten Character Recognition on Android for Basic Education Using Convolutional Neural Network
    Zin, Thi Thi
    Thant, Shin
    Pwint, Moe Zet
    Ogino, Tsugunobu
    ELECTRONICS, 2021, 10 (08)
  • [35] Offline handwritten Devanagari modified character recognition using convolutional neural network
    Mamta Bisht
    Richa Gupta
    Sādhanā, 2021, 46
  • [36] 2D Self-attention Convolutional Recurrent Network for Offline Handwritten Text Recognition
    Ly, Nam Tuan
    Nguyen, Hung Tuan
    Nakagawa, Masaki
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 191 - 204
  • [37] Recognition of Kannada Handwritten Words using SVM Classifier with Convolutional Neural Network
    Ramesh, G.
    Kumar, Sandeep N.
    Champa, H. N.
    2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 1114 - 1117
  • [38] Offline handwritten Devanagari modified character recognition using convolutional neural network
    Bisht, Mamta
    Gupta, Richa
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2021, 46 (01):
  • [39] TANDEM HMM WITH CONVOLUTIONAL NEURAL NETWORK FOR HANDWRITTEN WORD RECOGNITION
    Bluche, Theodore
    Ney, Hermann
    Kermorvant, Christopher
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2390 - 2394
  • [40] Recognition of online handwritten Gurmukhi characters using recurrent neural network classifier
    Singh, Harjeet
    Sharma, R. K.
    Singh, V. P.
    Kumar, Munish
    SOFT COMPUTING, 2021, 25 (08) : 6329 - 6338