Temporally-aware Convolutional Block Attention Module for Video Text Detection

被引:6
作者
Fujitake, Masato [1 ]
Ge, Hongpeng [2 ]
机构
[1] Grad Univ Adv Studies, Dept Informat, SOKENDAI, Tokyo, Japan
[2] Fast Accounting, Dept Adv Technol Res & Dev, Tokyo, Japan
来源
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2021年
关键词
D O I
10.1109/SMC52423.2021.9658799
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene text in video carries rich semantic information that is of great value in various content-based video applications. Existing methods have been proposed to improve accuracy, such as combining tracking; however, many have complex structures and are difficult to execute in real-time. Therefore, to run in real-time, this paper proposes a simple and practical feature refinement module, named Temporally-aware Convolutional Block Attention Module (TCBAM), based on a novel self-attention recurrent neural network. The model exploits still-image-based feature maps to refine temporal constant feature maps for better capturing widely varied appearances of video text. For better generalization, we also provide the flow-based data augmentation method with artificial data. Explements on the scene text video datasets including ICDAR2013 Video, Minetto, and RoadText-1K demonstrate that the proposed methods perform the competitive accuracy to the state-of-the-art models within real-time running. Our method with ResNext-50 can run at 17 FPS with 73.11 F-score on ICDAR 2013 Video without complex tracking methods.
引用
收藏
页码:220 / 225
页数:6
相关论文
共 30 条
[1]  
[Anonymous], Food Additives Contaminants
[2]  
Ballas N., 2015, ARXIV PREPRINT ARXIV, P2
[3]   FREE: A Fast and Robust End-to-End Video Text Spotter [J].
Cheng, Zhanzhan ;
Lu, Jing ;
Zou, Baorui ;
Qiao, Liang ;
Xu, Yunlu ;
Pu, Shiliang ;
Niu, Yi ;
Wu, Fei ;
Zhou, Shuigeng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :822-837
[4]   You Only Recognize Once: Towards Fast Video Text Spotting [J].
Cheng, Zhanzhan ;
Lu, Jing ;
Niu, Yi ;
Pu, Shiliang ;
Wu, Fei ;
Zhou, Shuigeng .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :855-863
[5]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[6]  
Hongyuan Yu, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P601, DOI 10.1109/ICDAR.2019.00102
[7]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[8]  
Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942
[9]   ICDAR 2013 Robust Reading Competition [J].
Karatzas, Dimosthenis ;
Shafait, Faisal ;
Uchida, Seiichi ;
Iwamura, Masakazu ;
Gomez i Bigorda, Lluis ;
Robles Mestre, Sergi ;
Mas, Joan ;
Fernandez Mota, David ;
Almazan Almazan, Jon ;
Pere de las Heras, Lluis .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1484-1493
[10]   Arbitrarily-oriented multi-lingual text detection in video [J].
Khare, Vijeta ;
Shivakumara, Palaiahnakote ;
Paramesran, Raveendran ;
Blumenstein, Michael .
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (15) :16625-16655