Self-supervised Implicit Glyph Attention for Text Recognition

被引:13
|
作者
Guan, Tongkun [1 ]
Gu, Chaochen [2 ]
Tu, Jingzheng [2 ]
Yang, Xue [1 ]
Feng, Qi [2 ]
Zhao, Yudi [2 ]
Shen, Wei [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
基金
上海市自然科学基金;
关键词
NETWORK;
D O I
10.1109/CVPR52729.2023.01467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The attention mechanism has become the de facto module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text annotations and or character-level bounding box annotations, respectively. Implicit attention, as it may extract coarse or even incorrect spatial regions as character attention, is prone to suffering from an alignment-drifted issue. Supervised attention can alleviate the above issue, but it is character category-specific, which requires extra laborious character-level bounding box annotations and would be memory-intensive when handling languages with larger character categories. To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations. Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods, in terms of both attention correctness and final recognition performance on publicly available context benchmarks and our contributed contextless benchmarks.
引用
收藏
页码:15285 / 15294
页数:10
相关论文
共 50 条
  • [31] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
    Manchin, Anthony
    Abbasnejad, Ehsan
    van den Hengel, Anton
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
  • [32] Heuristic Attention Representation Learning for Self-Supervised Pretraining
    Van Nhiem Tran
    Liu, Shen-Hsuan
    Li, Yung-Hui
    Wang, Jia-Ching
    SENSORS, 2022, 22 (14)
  • [33] Graph Multihead Attention Pooling with Self-Supervised Learning
    Wang, Yu
    Hu, Liang
    Wu, Yang
    Gao, Wanfu
    ENTROPY, 2022, 24 (12)
  • [34] Self-Supervised Graph Attention Collaborative Filtering for Recommendation
    Zhu, Jiangqiang
    Li, Kai
    Peng, Jinjia
    Qi, Jing
    ELECTRONICS, 2023, 12 (04)
  • [35] Self-Supervised Attention-Aware Reinforcement Learning
    Wu, Haiping
    Khetarpa, Khimya
    Precup, Doina
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10311 - 10319
  • [36] Self-supervised recurrent depth estimation with attention mechanisms
    Makarov I.
    Bakhanova M.
    Nikolenko S.
    Gerasimova O.
    PeerJ Computer Science, 2022, 8
  • [37] Self-supervised recurrent depth estimation with attention mechanisms
    Makarov, Ilya
    Bakhanova, Maria
    Nikolenko, Sergey
    Gerasimova, Olga
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [38] Self-supervised recurrent depth estimation with attention mechanisms
    Makarov, Ilya
    Bakhanova, Maria
    Nikolenko, Sergey
    Gerasimova, Olga
    PEERJ, 2022, 8
  • [39] Self-supervised attention flow for dialogue state tracking
    Pan, Boyuan
    Yang, Yazheng
    Li, Bo
    Cai, Deng
    NEUROCOMPUTING, 2021, 440 : 279 - 286
  • [40] Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition
    Yang, Yang
    Liu, Guangjun
    Gao, Xuehao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8623 - 8634