SemiText: Scene text detection with semi-supervised learning

被引：22

作者：

Liu, Juhua ^{[1
,2
,3
]}

Zhong, Qihuang ^{[1
,2
]}

Yuan, Yuan ^{[4
]}

Su, Hai ^{[5
]}

Du, Bo ^{[2
,6
]}

机构：

[1] Wuhan Univ, Sch Printing & Packaging, Wuhan, Peoples R China

[2] Wuhan Univ, Inst Artificial Intelligence, Wuhan, Peoples R China

[3] Wuhan Univ, Suzhou Inst, Suzhou, Peoples R China

[4] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China

[5] South China Normal Univ, Sch Software, Guangzhou, Peoples R China

[6] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China

来源：

NEUROCOMPUTING | 2020年 / 407卷

基金：

中国国家自然科学基金;

关键词：

Scene text detection; Semi-supervised learning; Mask R-CNN; Context information;

D O I：

10.1016/j.neucom.2020.05.059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text detection is an important step of scene text recognition and has achieved significant progress. However, the requirement of large amounts of annotated training data, which is used for training text detection model, has become a great challenge for existing methods. In this paper, we propose a semi -supervised scene text detection framework (SemiText), which trains robust and accurate scene text detectors using a pre-trained supervised model and the unannotated data. With a pre-trained model that is pre-trained on the fully annotated synthetic dataset, i.e., SynthText, we investigate the inductive and transductive semi-supervised learning on the unannotated dataset respectively. For inductive learning, the pre-trained model is applied to the unannotated training dataset to search for more training exam-ples, which are further combined with SynthText to fine-tune the pre-trained model and achieve a supe-rior detection model. For transductive learning, the unannotated training dataset is replaced with the unannotated test dataset. Meanwhile, for the aim of real-world applications, we adopt Mask R-CNN to detect text with arbitrary shapes and exploit context information to suppress false positives. Extensive experiments on different datasets show that the performance of our text detection method can be clearly improved under both inductive and transductive semi-supervision. Additionally, we also achieve state-of-the-art performance under full supervision. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：343 / 353

页数：11