Contextual Text Block Detection Towards Scene Text Understanding

被引:5
作者
Xue, Chuhui [1 ,2 ]
Huang, Jiaxing [1 ]
Zhang, Wenqing [2 ]
Lu, Shijian [1 ]
Wang, Changhu [2 ]
Bai, Song [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] ByteDance Inc, Singapore, Singapore
来源
COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷
关键词
Scene text detection; RECOGNITION;
D O I
10.1007/978-3-031-19815-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing scene text detectors focus on detecting characters or words that only capture partial text messages due to missing contextual information. For a better understanding of text in scenes, it is more desired to detect contextual text blocks (CTBs) which consist of one or multiple integral text units (e.g., characters, words, or phrases) in natural reading order and transmit certain complete text messages. This paper presents contextual text detection, a new setup that detects CTBs for better understanding of texts in scenes. We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB. To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence. In addition, we create two datasets SCUT-CTW-Context and ReCTS-Context to facilitate future research, where each CTB is well annotated by an ordered sequence of integral text units. Further, we introduce three metrics that measure contextual text detection in local accuracy, continuity, and global accuracy. Extensive experiments show that our method accurately detects CTBs which effectively facilitates downstream tasks such as text classification and translation. The project is available at https://sg-vilab.github.io/publication/xue2022contextual/.
引用
收藏
页码:374 / 391
页数:18
相关论文
共 56 条
[1]  
Ba J. L., 2016, arXiv, DOI 10.48550/arXiv:1607.06450
[2]   Character Region Awareness for Text Detection [J].
Baek, Youngmin ;
Lee, Bado ;
Han, Dongyoon ;
Yun, Sangdoo ;
Lee, Hwalsuk .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9357-9366
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]  
Bojanowski P., 2017, Transactions of the association for computational linguistics, V5, P135, DOI [10.1162/tacl_a_00051, 10.1162/tacla00051, DOI 10.1162/TACL_A_00051]
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[7]   ICDAR2017 Competition on Recognition of Documents with Complex Layouts-RDCL2017 [J].
Clausner, C. ;
Antonacopoulos, A. ;
Pletschacher, S. .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1404-1410
[8]   DeepPermNet: Visual Permutation Learning [J].
Cruz, Rodrigo Santa ;
Fernando, Basura ;
Cherian, Anoop ;
Gould, Stephen .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6044-6052
[9]   Progressive Contour Regression for Arbitrary-Shape Scene Text Detection [J].
Dai, Pengwen ;
Zhang, Sanyi ;
Zhang, Hua ;
Cao, Xiaochun .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7389-7398
[10]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]