共 37 条
[1]
Vision Transformer for Fast and Efficient Scene Text Recognition
[J].
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I,
2021, 12821
:319-334
[2]
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:4714-4722
[3]
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:14920-14929
[4]
Chen SZ, 2021, ADV NEUR IN, V34
[5]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]
Dosovitskiy A., 2021, P 9 INT C LEARN REPR
[7]
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:7094-7103
[8]
Fenfen Sheng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P781, DOI 10.1109/ICDAR.2019.00130
[9]
Multi-modal Transformer for Video Retrieval
[J].
COMPUTER VISION - ECCV 2020, PT IV,
2020, 12349
:214-229
[10]
Synthetic Data for Text Localisation in Natural Images
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:2315-2324