Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

被引：0

作者：

Das, Alloy ^{[1
]}

Palaiahnakote, Shivakumara ^{[2
]}

Banerjee, Ayan ^{[1
]}

Antonacopoulos, Apostolos ^{[2
]}

Pal, Umapada ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India

[2] Univ Salford, Pattern Recognit & Image Anal PRImA Res Lab, Manchester, England

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 305卷

关键词：

Scene text detection; Scene text recognition; Scene text correction; Occluded scene text; Graph neural network; Convolutional recurrent neural network; Convolutional neural network;

D O I：

10.1016/j.knosys.2024.112593

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of unpredictable occlusions on natural scene text is a significant challenge, exacerbating the difficulties already posed on text detection and recognition by the variability of such images. Addressing the need for a robust, consistently performing approach that can effectively address the above challenges, this paper presents a new Soft Set-based end-to-end system for text detection, recognition and prediction in occluded natural scene images. This is the first approach to integrate text detection, recognition and prediction, unlike existing systems developed for end-to-end text spotting (text detection and recognition) only. For candidate text components detection, the proposed combination of Soft Sets with Maximally Stable Extremal Regions (SSMSER) improves text detection and spotting in natural scene images, irrespectively of the presence of arbitrarily orientated and shaped text, complex backgrounds and occlusion. Furthermore, a Graph Recurrent Neural Network is proposed for grouping candidate text components into text lines and for fitting accurate bounding boxes to each word. Finally, a Convolutional Recurrent Neural Network (CRNN) is proposed for the recognition of text and for predicting missing characters due to occlusion. Experimental results on a new occluded scene text dataset (OSTD) and on the most relevant benchmark natural scene text datasets demonstrate that the proposed system outperforms the state-of-the-art in text detection, recognition and prediction. The code and dataset are available at https://github.com/alloydas/Softset-MSER-Based-Occluded-Scene-Text-Spotting/blob/master/S oft_set_MSER.ipynb

引用

页数：19

共 50 条

[1] End-to-End Scene Text Recognition
Wang, Kai
Babenko, Boris
Belongie, Serge
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
[2] End-to-End Scene Text Recognition with Character Centroid Prediction
Zhao, Wei
Ma, Jinwen
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 291 - 299
[3] An End-to-End Scene Text Recognition for Bilingual Text
Albalawi, Bayan M.
Jamal, Amani T.
Al Khuzayem, Lama A.
Alsaedi, Olaa A.
BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
[4] Transformer-based end-to-end scene text recognition
Zhu, Xinghao
Zhang, Zhi
PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
[5] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
Hao, Jiedong
Wen, Yafei
Deng, Jie
Gan, Jun
Ren, Shuai
Tan, Hui
Chen, Xiaoxin
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
[6] End-to-End Analysis for Text Detection and Recognition in Natural Scene Images
Alnefaie, Ahlam
Gupta, Deepak
Bhuyan, Monowar H.
Razzak, Imran
Gupta, Prashant
Prasad, Mukesh
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[7] Scene text spotting based on end-to-end
Wei G.
Rong W.
Liang Y.
Xiao X.
Liu X.
Journal of Intelligent and Fuzzy Systems, 2021, 40 (05): : 8871 - 8881
[8] End-to-end Scene Text Recognition in Videos Based on Multi Frame Tracking
Wang, Xiaobing
Jiang, Yingying
Yang, Shuli
Zhu, Xiangyu
Li, Wei
Fu, Pei
Wang, Hua
Luo, Zhenbo
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1255 - 1260
[9] Person Re-identification with End-to-End Scene Text Recognition
Kamlesh
Xu, Pei
Yang, Yang
Xu, Yongchao
COMPUTER VISION, PT III, 2017, 773 : 363 - 374
[10] An end-to-end model for multi-view scene text recognition
Banerjee, Ayan
Shivakumara, Palaiahnakote
Bhattacharya, Saumik
Pal, Umapada
Liu, Cheng-Lin
PATTERN RECOGNITION, 2024, 149

← 1 2 3 4 5 →