You Only Recognize Once: Towards Fast Video Text Spotting

被引:26
作者
Cheng, Zhanzhan [1 ]
Lu, Jing [1 ]
Niu, Yi [1 ]
Pu, Shiliang [1 ]
Wu, Fei [2 ]
Zhou, Shuigeng [3 ]
机构
[1] Hikvis Res Inst, Hangzhou, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
[3] Fudan Univ, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年
关键词
video text spotting; detection; tracking; quality scoring; TRACKING;
D O I
10.1145/3343031.3351093
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, frame-wisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as the interferences of low-quality text. In this paper, we propose a fast and robust video text spotting framework by only recognizing the localized text one-time instead of frame-wisely recognition. Specifically, we first obtain text regions in videos with a well-designed spatial-temporal detector. Then we concentrate on developing a novel text recommender for selecting the highest-quality text from text streams and only recognizing the selected ones. Here, the recommender assembles text tracking, quality scoring and recognition into an end-to-end trainable module, which not only avoids the interferences from low-quality text but also dramatically speeds up the video text spotting process. In addition, we collect a larger scale video text dataset (LSVTD) for promoting the video text spotting community, which contains 100 text videos from 22 different real-life scenarios. Extensive experiments on two public benchmarks show that our method greatly speeds up the recognition process averagely by 71 times compared with the frame-wise manner, and also achieves the remarkable state-of-the-art.
引用
收藏
页码:855 / 863
页数:9
相关论文
共 62 条
[1]  
[Anonymous], ARXIV180600578
[2]  
[Anonymous], 2017, ARXIV170609579
[3]  
[Anonymous], ARXIV14062227
[4]  
[Anonymous], 2011, P 2011 IEEE WORKSH A, DOI DOI 10.1109/WACV.2011.5711545
[5]  
[Anonymous], 2017, EAST: An Efficient and Accurate Scene Text Detector
[6]  
[Anonymous], 2007, P CBDAR
[7]  
[Anonymous], 2016, P INT JOINT C ART IN
[8]  
[Anonymous], 2014, ICME
[9]  
[Anonymous], 2018, ARXIV180409003
[10]  
[Anonymous], ICCV