You Only Recognize Once: Towards Fast Video Text Spotting

被引：26

作者：

Cheng, Zhanzhan ^{[1
]}

Lu, Jing ^{[1
]}

Niu, Yi ^{[1
]}

Pu, Shiliang ^{[1
]}

Wu, Fei ^{[2
]}

Zhou, Shuigeng ^{[3
]}

机构：

[1] Hikvis Res Inst, Hangzhou, Peoples R China

[2] Zhejiang Univ, Hangzhou, Peoples R China

[3] Fudan Univ, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年

关键词：

video text spotting; detection; tracking; quality scoring; TRACKING;

D O I：

10.1145/3343031.3351093

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, frame-wisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as the interferences of low-quality text. In this paper, we propose a fast and robust video text spotting framework by only recognizing the localized text one-time instead of frame-wisely recognition. Specifically, we first obtain text regions in videos with a well-designed spatial-temporal detector. Then we concentrate on developing a novel text recommender for selecting the highest-quality text from text streams and only recognizing the selected ones. Here, the recommender assembles text tracking, quality scoring and recognition into an end-to-end trainable module, which not only avoids the interferences from low-quality text but also dramatically speeds up the video text spotting process. In addition, we collect a larger scale video text dataset (LSVTD) for promoting the video text spotting community, which contains 100 text videos from 22 different real-life scenarios. Extensive experiments on two public benchmarks show that our method greatly speeds up the recognition process averagely by 71 times compared with the frame-wise manner, and also achieves the remarkable state-of-the-art.

引用

页码：855 / 863

页数：9

共 62 条

[1]

[Anonymous], ARXIV180600578

[2]

[Anonymous], 2017, ARXIV170609579

[3]

[Anonymous], ARXIV14062227

[4]

[Anonymous], 2011, P 2011 IEEE WORKSH A, DOI DOI 10.1109/WACV.2011.5711545

[5]

[Anonymous], 2017, EAST: An Efficient and Accurate Scene Text Detector

[6]

[Anonymous], 2007, P CBDAR

[7]

[Anonymous], 2016, P INT JOINT C ART IN

[8]

[Anonymous], 2014, ICME

[9]

[Anonymous], 2018, ARXIV180409003

[10]

[Anonymous], ICCV

← 1 2 3 4 5 6 7 →