Deep representation learning for license plate recognition in low quality video images

被引:0
作者
Zhao, Kemeng [1 ]
Peng, Liangrui [1 ]
Ding, Ning [1 ]
Yao, Gang [1 ]
Tang, Pei [1 ]
Wang, Shengjin [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
关键词
License plate recognition; Object detection and tracking; Deep learning; Self-attention mechanism; Spatial transformer network;
D O I
10.1007/s00138-025-01678-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
License plate recognition is an important technology in many application scenarios such as traffic monitoring and vehicle management. Due to variations in viewpoint, illumination, motion-blur, and degradation during the imaging process, it is still a challenging problem to detect and recognize license plates in low quality video images. In this paper, we focus on efficient deep representation learning for license plate recognition, detection and tracking. For license plate recognition, we mainly investigate the configuration of different network structures. We design a novel backbone network structure called SACNN, which combines convolutional neural network (CNN) and self-attention mechanism to learn non-linear representations for the structural patterns of characters in low quality video images. The proposed license plate recognition model employs the SACNN backbone network, a Long Short-Term Memory (LSTM) encoder and a Transformer decoder. For license plate detection, a Transformer encoder-decoder based method is adopted. To tackle the variations in license plate appearances and perspectives, an image rectification method is incorporated by using a spatial transformer network. For license plate tracking, a multi-object tracking method is incorporated by using Kalman filtering and temporal matching to associate detected license plates in video frames. Experiments are mainly carried out on the public large-scale video-based license plate dataset (LSV-LP) to validate the proposed methods.
引用
收藏
页数:14
相关论文
共 46 条
  • [1] Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [2] Vision Transformer for Fast and Efficient Scene Text Recognition
    Atienza, Rowel
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334
  • [3] Ba J.L., 2016, arXiv
  • [4] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
  • [5] Scene Text Recognition with Permuted Autoregressive Sequence Models
    Bautista, Darwin
    Atienza, Rowel
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 178 - 196
  • [6] Björklund T, 2017, IEEE INT WORKSH MULT
  • [7] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [8] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [9] Dong M., 2017, BMVC
  • [10] github, Synthetic Chinese License Plate