Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

被引：0

作者：

Das, Alloy ^{[1
]}

Palaiahnakote, Shivakumara ^{[2
]}

Banerjee, Ayan ^{[1
]}

Antonacopoulos, Apostolos ^{[2
]}

Pal, Umapada ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India

[2] Univ Salford, Pattern Recognit & Image Anal PRImA Res Lab, Manchester, England

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 305卷

关键词：

Scene text detection; Scene text recognition; Scene text correction; Occluded scene text; Graph neural network; Convolutional recurrent neural network; Convolutional neural network;

D O I：

10.1016/j.knosys.2024.112593

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of unpredictable occlusions on natural scene text is a significant challenge, exacerbating the difficulties already posed on text detection and recognition by the variability of such images. Addressing the need for a robust, consistently performing approach that can effectively address the above challenges, this paper presents a new Soft Set-based end-to-end system for text detection, recognition and prediction in occluded natural scene images. This is the first approach to integrate text detection, recognition and prediction, unlike existing systems developed for end-to-end text spotting (text detection and recognition) only. For candidate text components detection, the proposed combination of Soft Sets with Maximally Stable Extremal Regions (SSMSER) improves text detection and spotting in natural scene images, irrespectively of the presence of arbitrarily orientated and shaped text, complex backgrounds and occlusion. Furthermore, a Graph Recurrent Neural Network is proposed for grouping candidate text components into text lines and for fitting accurate bounding boxes to each word. Finally, a Convolutional Recurrent Neural Network (CRNN) is proposed for the recognition of text and for predicting missing characters due to occlusion. Experimental results on a new occluded scene text dataset (OSTD) and on the most relevant benchmark natural scene text datasets demonstrate that the proposed system outperforms the state-of-the-art in text detection, recognition and prediction. The code and dataset are available at https://github.com/alloydas/Softset-MSER-Based-Occluded-Scene-Text-Spotting/blob/master/S oft_set_MSER.ipynb

引用

页数：19

共 50 条

[21] End-to-end scene text recognition using tree-structured models
Shi, Cunzhao
Wang, Chunheng
Xiao, Baihua
Gao, Song
Hu, Jinlong
PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
[22] ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
Zhan, Fangneng
Lu, Shijian
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2054 - 2063
[23] DiZNet: An end-to-end text detection and recognition algorithm with detail in text zone
Zhou, Di
Zhang, Jianxun
Li, Chao
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
[24] An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
Shi, Baoguang
Bai, Xiang
Yao, Cong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) : 2298 - 2304
[25] Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images
Chandio, Asghar Ali
Asikuzzamana, Md.
Pickering, Mark
Leghari, Mehwish
DATA IN BRIEF, 2020, 31
[26] An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image
Lalitha, G.
Lavanya, B.
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (07): : 220 - 228
[27] OctShuffleMLT: A Compact Octave Based Neural Network for End-to-End Multilingual Text Detection and Recognition
Lundgren, Antonio
Castro, Dayvid
Lima, Estanislau
Bezerra, Byron
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW) AND 8TH INTERNATIONAL WORKSHOP ON CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION, VOL 4, 2019, : 37 - 42
[28] RMFPN: End-to-End Scene Text Recognition Using Multi-Feature Pyramid Network
Mahadshetti, Ruturaj
Lee, Guee-Sang
Choi, Deok-Jai
IEEE ACCESS, 2023, 11 : 61892 - 61900
[29] A Robust Ensemble of ResNets for Character Level End-to-end Text Detection in Natural Scene Images
Kim, Jinsu
Kim, Yoonhyung
Kim, Changick
PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,
[30] Scene text detection using structured information and an end-to-end trainable generative adversarial networks
Naveen, Palanichamy
Hassaballah, Mahmoud
PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)

← 1 2 3 4 5 →