Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

被引:0
|
作者
Das, Alloy [1 ]
Palaiahnakote, Shivakumara [2 ]
Banerjee, Ayan [1 ]
Antonacopoulos, Apostolos [2 ]
Pal, Umapada [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India
[2] Univ Salford, Pattern Recognit & Image Anal PRImA Res Lab, Manchester, England
关键词
Scene text detection; Scene text recognition; Scene text correction; Occluded scene text; Graph neural network; Convolutional recurrent neural network; Convolutional neural network;
D O I
10.1016/j.knosys.2024.112593
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of unpredictable occlusions on natural scene text is a significant challenge, exacerbating the difficulties already posed on text detection and recognition by the variability of such images. Addressing the need for a robust, consistently performing approach that can effectively address the above challenges, this paper presents a new Soft Set-based end-to-end system for text detection, recognition and prediction in occluded natural scene images. This is the first approach to integrate text detection, recognition and prediction, unlike existing systems developed for end-to-end text spotting (text detection and recognition) only. For candidate text components detection, the proposed combination of Soft Sets with Maximally Stable Extremal Regions (SSMSER) improves text detection and spotting in natural scene images, irrespectively of the presence of arbitrarily orientated and shaped text, complex backgrounds and occlusion. Furthermore, a Graph Recurrent Neural Network is proposed for grouping candidate text components into text lines and for fitting accurate bounding boxes to each word. Finally, a Convolutional Recurrent Neural Network (CRNN) is proposed for the recognition of text and for predicting missing characters due to occlusion. Experimental results on a new occluded scene text dataset (OSTD) and on the most relevant benchmark natural scene text datasets demonstrate that the proposed system outperforms the state-of-the-art in text detection, recognition and prediction. The code and dataset are available at https://github.com/alloydas/Softset-MSER-Based-Occluded-Scene-Text-Spotting/blob/master/S oft_set_MSER.ipynb
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Gaussian Prediction based Attention for Online End-to-End Speech Recognition
    Hou, Junfeng
    Zhang, Shiliang
    Dai, Lirong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3692 - 3696
  • [32] A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition
    Mohammed, Adam Ahmed Qaid
    Lv, Jiancheng
    Islam, Md. Sajjatul
    SENSORS, 2019, 19 (23)
  • [33] End-to-end speech recognition system based on improved CLDNN structure
    Feng, Yujie
    Zhang, Yi
    Xu, Xuan
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 538 - 542
  • [34] Deep-learning based end-to-end system for text reading in the wild
    Harizi, Riadh
    Walha, Rim
    Drira, Fadoua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24691 - 24719
  • [35] Deep-learning based end-to-end system for text reading in the wild
    Riadh Harizi
    Rim Walha
    Fadoua Drira
    Multimedia Tools and Applications, 2022, 81 : 24691 - 24719
  • [36] Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
    Tang, Jingqun
    Qian, Wenming
    Song, Luchuan
    Dong, Xiena
    Li, Lan
    Bai, Xiang
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 233 - 248
  • [37] Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition
    Yoshihashi, Ryota
    Tanaka, Tomohiro
    Doi, Kenji
    Fujino, Takumi
    Yamashita, Naoaki
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 240 - 257
  • [38] EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 913 - 919
  • [39] End-to-end DNN based text-independent speaker recognition for long and short utterances
    Rohdin, Johan
    Silnova, Anna
    Diez, Mireia
    Plchot, Oldrich
    Matejka, Pavel
    Burget, Lukas
    Glembek, Ondrej
    COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 22 - 35
  • [40] Natural scene text detection and recognition based on saturation-incorporated multi-channel MSER
    Tong, Guoxiang
    Dong, Ming
    Sun, Xiaoxia
    Song, Yan
    KNOWLEDGE-BASED SYSTEMS, 2022, 250