ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting

被引:2
作者
Huang, Zepeng [1 ]
Wan, Qi [1 ]
Chen, Junliang [1 ]
Zhao, Xiaodong [1 ]
Ye, Kai [1 ]
Shen, Linlin [1 ]
机构
[1] Shenzhen Univ, Guangdong Key Lab Intelligent Informat Proc, Comp Vis Inst, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
基金
中国国家自然科学基金;
关键词
End-to-end text spotting; text detection; text recognition; segmentation;
D O I
10.1109/ICME55011.2023.00243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.
引用
收藏
页码:1403 / 1408
页数:6
相关论文
共 40 条
[1]  
[Anonymous], 2006, P 23 INT C MACH LEAR, DOI DOI 10.1145/1143844.1143891
[2]   Character Region Awareness for Text Detection [J].
Baek, Youngmin ;
Lee, Bado ;
Han, Dongyoon ;
Yun, Sangdoo ;
Lee, Hwalsuk .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9357-9366
[3]   Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework [J].
Busta, Michal ;
Neumann, Lukas ;
Matas, Jiri .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2223-2231
[4]  
Carion N, 2020, Img Proc Comp Vis Re, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
[5]   Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].
Ch'ng, Chee Kheng ;
Chan, Chee Seng .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942
[6]   TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting [J].
Feng, Wei ;
He, Wenhao ;
Yin, Fei ;
Zhang, Xu-Yao ;
Liu, Cheng-Lin .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9075-9084
[7]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[8]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[9]  
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
[10]  
Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942