Scene text spotting based on end-to-end

被引：0

作者：

Wei G. ^{[1
,2
]}

Rong W. ^{[1
]}

Liang Y. ^{[1
]}

Xiao X. ^{[1
]}

Liu X. ^{[1
]}

机构：

[1] College of Computer Science and Engineering, Shandong University of Science and Technology, Shandong, Qingdao

[2] College of Intelligent Equipment, Shandong University of Science and Technology, Shandong, Taian

来源：

Journal of Intelligent and Fuzzy Systems | 2021年 / 40卷 / 05期

关键词：

End-to-end; Joint optimization; SAM-BiLSTM; Scene text spotting; TCM;

D O I：

10.3233/JIFS-200903

中图分类号：

TN911 [通信理论];

学科分类号：

081002 ;

摘要：

Aiming at the problem that the traditional OCR processing method ignores the inherent connection between the text detection task and the text recognition task, This paper propose a novel end-to-end text spotting framework. The framework includes three parts: shared convolutional feature network, text detector and text recognizer. By sharing convolutional feature network, the text detection network and the text recognition network can be jointly optimized at the same time. On the one hand, it can reduce the computational burden; on the other hand, it can effectively use the inherent connection between text detection and text recognition. This model add the TCM (Text Context Module) on the basis of Mask RCNN, which can effectively solve the negative sample problem in text detection tasks. This paper propose a text recognition model based on the SAM-BiLSTM (spatial attention mechanism with BiLSTM), which can more effectively extract the semantic information between characters. This model significantly surpasses state-of-the-art methods on a number of text detection and text spotting benchmarks, including ICDAR 2015, Total-Text. © 2021 - IOS Press. All rights reserved.

引用

页码：8871 / 8881

页数：10

共 50 条

[31] End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention
Wei, Bo
Yang, Meirong
Zhang, Tao
Tang, Xiao
Huang, Xing
Kim, Kyuhong
Lee, Jaeyun
Cho, Kiho
Park, Sung-Un
INTERSPEECH 2021, 2021, : 361 - 365
[32] End-to-end text-to-speech synthesis with unaligned multiple language units based on attention
Aso, Masashi
Takamichi, Shinnosuke
Saruwatari, Hiroshi
INTERSPEECH 2020, 2020, : 4009 - 4013
[33] Myanmar Text-to-Speech Synthesis Using End-to-End Model
Qin, Qinglai
Yang, Jian
Li, Peiying
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 6 - 11
[34] TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
Zhang, Peng
Xu, Yunlu
Cheng, Zhanzhan
Pu, Shiliang
Lu, Jing
Qiao, Liang
Niu, Yi
Wu, Fei
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1413 - 1422
[35] End-to-end hard constrained text generation via incrementally predicting segments
Nie, Jinran
Huang, Xuancheng
Liu, Yang
Kong, Cunliang
Liu, Xin
Yang, Liner
Yang, Erhong
KNOWLEDGE-BASED SYSTEMS, 2023, 278
[36] Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting
Zhang, Kun
Wu, Zhiyong
Yuan, Daode
Luan, Jian
Jia, Jia
Meng, Helen
Song, Binheng
INTERSPEECH 2020, 2020, : 2567 - 2571
[37] TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network
Sun, Yipeng
Zhang, Chengquan
Huang, Zuming
Liu, Jiaming
Han, Junyu
Ding, Errui
COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 83 - 99
[38] On the Training and Testing Data Preparation for End-to-End Text-to-Speech Application
Duc Chung Tran
Khan, M. K. A. Ahamed
Sridevi, S.
2020 11TH IEEE CONTROL AND SYSTEM GRADUATE RESEARCH COLLOQUIUM (ICSGRC), 2020, : 73 - 75
[39] An End-to-End Depression Recognition Method Based on EEGNet
Liu, Bo
Chang, Hongli
Peng, Kang
Wang, Xuenan
FRONTIERS IN PSYCHIATRY, 2022, 13
[40] RefineNet-based End-to-end Speech Enhancement
Lan T.
Peng C.
Li S.
Qian Y.-X.
Chen C.
Liu Q.
Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (02): : 554 - 563

← 1 2 3 4 5 →