Temporally-aware Convolutional Block Attention Module for Video Text Detection

被引：6

作者：

Fujitake, Masato ^{[1
]}

Ge, Hongpeng ^{[2
]}

机构：

[1] Grad Univ Adv Studies, Dept Informat, SOKENDAI, Tokyo, Japan

[2] Fast Accounting, Dept Adv Technol Res & Dev, Tokyo, Japan

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2021年

关键词：

D O I：

10.1109/SMC52423.2021.9658799

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scene text in video carries rich semantic information that is of great value in various content-based video applications. Existing methods have been proposed to improve accuracy, such as combining tracking; however, many have complex structures and are difficult to execute in real-time. Therefore, to run in real-time, this paper proposes a simple and practical feature refinement module, named Temporally-aware Convolutional Block Attention Module (TCBAM), based on a novel self-attention recurrent neural network. The model exploits still-image-based feature maps to refine temporal constant feature maps for better capturing widely varied appearances of video text. For better generalization, we also provide the flow-based data augmentation method with artificial data. Explements on the scene text video datasets including ICDAR2013 Video, Minetto, and RoadText-1K demonstrate that the proposed methods perform the competitive accuracy to the state-of-the-art models within real-time running. Our method with ResNext-50 can run at 17 FPS with 73.11 F-score on ICDAR 2013 Video without complex tracking methods.

引用

页码：220 / 225

页数：6

共 30 条

[1]

[Anonymous], Food Additives Contaminants

[2]

Ballas N., 2015, ARXIV PREPRINT ARXIV, P2

[3] FREE: A Fast and Robust End-to-End Video Text Spotter [J].

Cheng, Zhanzhan ;

Lu, Jing ;

Zou, Baorui ;

Qiao, Liang ;

Xu, Yunlu ;

Pu, Shiliang ;

Niu, Yi ;

Wu, Fei ;

Zhou, Shuigeng .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :822-837

[4] You Only Recognize Once: Towards Fast Video Text Spotting [J].

Cheng, Zhanzhan ;

Lu, Jing ;

Niu, Yi ;

Pu, Shiliang ;

Wu, Fei ;

Zhou, Shuigeng .

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :855-863

[5] Synthetic Data for Text Localisation in Natural Images [J].

Gupta, Ankush ;

Vedaldi, Andrea ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324

[6]

Hongyuan Yu, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P601, DOI 10.1109/ICDAR.2019.00102

[7]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[8]

Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942

[9] ICDAR 2013 Robust Reading Competition [J].

Karatzas, Dimosthenis ;

Shafait, Faisal ;

Uchida, Seiichi ;

Iwamura, Masakazu ;

Gomez i Bigorda, Lluis ;

Robles Mestre, Sergi ;

Mas, Joan ;

Fernandez Mota, David ;

Almazan Almazan, Jon ;

Pere de las Heras, Lluis .

2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1484-1493

[10] Arbitrarily-oriented multi-lingual text detection in video [J].

Khare, Vijeta ;

Shivakumara, Palaiahnakote ;

Paramesran, Raveendran ;

Blumenstein, Michael .

MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (15) :16625-16655

← 1 2 3 →