Handwritten text recognition and information extraction from ancient manuscripts using deep convolutional and recurrent neural network

被引:0
作者
El Bahi, Hassan [1 ]
机构
[1] L2IS, Laboratory of Computer and Systems Engineering, Cadi Ayyad University, B.P. 511, Marrakech
关键词
Ancient manuscripts; Convolutional neural network; Handwritten text recognition; Named entity recognition; Recurrent neural network;
D O I
10.1007/s00500-024-09930-6
中图分类号
学科分类号
摘要
Digitizing ancient manuscripts and making them accessible to a broader audience is a crucial step in unlocking the wealth of information they hold. However, automatic recognition of handwritten text and the extraction of relevant information such as named entities from these manuscripts are among the most difficult research topics, due to several factors such as poor quality of manuscripts, complex background, presence of ink stains, cursive handwriting, etc. To meet these challenges, we propose two systems, the first system performs the task of handwritten text recognition (HTR) in ancient manuscripts; it starts with a preprocessing operation. Then, a convolutional neural network (CNN) is used to extract the features of each input image. Finally, a recurrent neural network (RNN) which has Long Short-Term Memory (LSTM) blocks with the Connectionist Temporal Classification (CTC) layer will predict the text contained in the image. The second system focuses on recognizing named entities and deciphering the relationships among words directly from images of old manuscripts, bypassing the need for an intermediate text transcription step. Like the previous system, this second system starts with a preprocessing step. Then the data augmentation technique is used to increase the training dataset. After that, the extraction of the most relevant features is done automatically using a CNN model. Finally, the recognition of names entities and the relationship between word images is performed using a bidirectional LSTM. Extensive experiments on the ESPOSALLES dataset demonstrate that the proposed systems achieve the state-of-the-art performance exceeding existing systems. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
页码:12249 / 12268
页数:19
相关论文
共 50 条
  • [41] Convolutional Neural Network Based Intelligent Handwritten Document Recognition
    Abbas, Sagheer
    Alhwaiti, Yousef
    Fatima, Areej
    Khan, Muhammad A.
    Khan, Muhammad Adnan
    Ghazal, Taher M.
    Kanwal, Asma
    Ahmad, Munir
    Elmitwally, Nouh Sabri
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (03): : 4563 - 4581
  • [42] Recognition of online handwritten Gurmukhi characters using recurrent neural network classifier
    Harjeet Singh
    R. K. Sharma
    V. P. Singh
    Munish Kumar
    Soft Computing, 2021, 25 : 6329 - 6338
  • [43] Isolated Bangla Handwritten Character Recognition with Convolutional Neural Network
    Alif, Mujadded Al Rabbani
    Ahmed, Sabbir
    Hasan, Muhammad Abul
    2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [44] Isolated Handwritten Balinese Character Recognition from Palm Leaf Manuscripts with Residual Convolutional Neural Networks
    Arsa, Dewa Made Sri
    Putri, Gusti Agung Ayu
    Zen, Remmy
    Bressan, Stephane
    2020 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (IEEE KSE 2020), 2020, : 224 - 229
  • [45] A Sketch Recognition Method Based on Deep Convolutional-Recurrent Neural Network
    Zhao P.
    Liu Y.
    Liu H.
    Yao S.
    2018, Institute of Computing Technology (30): : 217 - 224
  • [46] Research on advertising content recognition based on convolutional neural network and recurrent neural network
    Liu, Xiaomei
    Qi, Fazhi
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2021, 24 (04) : 398 - 404
  • [47] Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network
    Coquenet, Denis
    Chatelain, Clement
    Paquet, Thierry
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 19 - 24
  • [48] Deep learning classification of biomedical text using convolutional neural network
    Dollah R.
    Sheng C.Y.
    Zakaria N.
    Othman M.S.
    Rasib A.W.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (08): : 512 - 517
  • [49] Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition
    Agrawal, Vanita
    Jagtap, Jayant
    Patil, Shruti
    Kotecha, Ketan
    METHODSX, 2024, 12
  • [50] Deep Learning Classification of Biomedical Text using Convolutional Neural Network
    Dollah, Rozilawati
    Sheng, Chew Yi
    Zakaria, Norhawaniah
    Othman, Mohd Shahizan
    Rasib, Abd Wahid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (08) : 512 - 517