Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

被引：0

作者：

Das, Alloy ^{[1
]}

Palaiahnakote, Shivakumara ^{[2
]}

Banerjee, Ayan ^{[1
]}

Antonacopoulos, Apostolos ^{[2
]}

Pal, Umapada ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India

[2] Univ Salford, Pattern Recognit & Image Anal PRImA Res Lab, Manchester, England

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 305卷

关键词：

Scene text detection; Scene text recognition; Scene text correction; Occluded scene text; Graph neural network; Convolutional recurrent neural network; Convolutional neural network;

D O I：

10.1016/j.knosys.2024.112593

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of unpredictable occlusions on natural scene text is a significant challenge, exacerbating the difficulties already posed on text detection and recognition by the variability of such images. Addressing the need for a robust, consistently performing approach that can effectively address the above challenges, this paper presents a new Soft Set-based end-to-end system for text detection, recognition and prediction in occluded natural scene images. This is the first approach to integrate text detection, recognition and prediction, unlike existing systems developed for end-to-end text spotting (text detection and recognition) only. For candidate text components detection, the proposed combination of Soft Sets with Maximally Stable Extremal Regions (SSMSER) improves text detection and spotting in natural scene images, irrespectively of the presence of arbitrarily orientated and shaped text, complex backgrounds and occlusion. Furthermore, a Graph Recurrent Neural Network is proposed for grouping candidate text components into text lines and for fitting accurate bounding boxes to each word. Finally, a Convolutional Recurrent Neural Network (CRNN) is proposed for the recognition of text and for predicting missing characters due to occlusion. Experimental results on a new occluded scene text dataset (OSTD) and on the most relevant benchmark natural scene text datasets demonstrate that the proposed system outperforms the state-of-the-art in text detection, recognition and prediction. The code and dataset are available at https://github.com/alloydas/Softset-MSER-Based-Occluded-Scene-Text-Spotting/blob/master/S oft_set_MSER.ipynb

引用

页数：19

共 50 条

[31] Gaussian Prediction based Attention for Online End-to-End Speech Recognition
Hou, Junfeng
Zhang, Shiliang
Dai, Lirong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3692 - 3696
[32] A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition
Mohammed, Adam Ahmed Qaid
Lv, Jiancheng
Islam, Md. Sajjatul
SENSORS, 2019, 19 (23)
[33] End-to-end speech recognition system based on improved CLDNN structure
Feng, Yujie
Zhang, Yi
Xu, Xuan
PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 538 - 542
[34] Deep-learning based end-to-end system for text reading in the wild
Harizi, Riadh
Walha, Rim
Drira, Fadoua
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24691 - 24719
[35] Deep-learning based end-to-end system for text reading in the wild
Riadh Harizi
Rim Walha
Fadoua Drira
Multimedia Tools and Applications, 2022, 81 : 24691 - 24719
[36] Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Tang, Jingqun
Qian, Wenming
Song, Luchuan
Dong, Xiena
Li, Lan
Bai, Xiang
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 233 - 248
[37] Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition
Yoshihashi, Ryota
Tanaka, Tomohiro
Doi, Kenji
Fujino, Takumi
Yamashita, Naoaki
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 240 - 257
[38] EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION
Drexler, Jennifer
Glass, James
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 913 - 919
[39] End-to-end DNN based text-independent speaker recognition for long and short utterances
Rohdin, Johan
Silnova, Anna
Diez, Mireia
Plchot, Oldrich
Matejka, Pavel
Burget, Lukas
Glembek, Ondrej
COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 22 - 35
[40] Natural scene text detection and recognition based on saturation-incorporated multi-channel MSER
Tong, Guoxiang
Dong, Ming
Sun, Xiaoxia
Song, Yan
KNOWLEDGE-BASED SYSTEMS, 2022, 250

← 1 2 3 4 5 →