Deep representation learning for license plate recognition in low quality video images

被引：0

作者：

Zhao, Kemeng ^{[1
]}

Peng, Liangrui ^{[1
]}

Ding, Ning ^{[1
]}

Yao, Gang ^{[1
]}

Tang, Pei ^{[1
]}

Wang, Shengjin ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2025年 / 36卷 / 03期

关键词：

License plate recognition; Object detection and tracking; Deep learning; Self-attention mechanism; Spatial transformer network;

D O I：

10.1007/s00138-025-01678-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

License plate recognition is an important technology in many application scenarios such as traffic monitoring and vehicle management. Due to variations in viewpoint, illumination, motion-blur, and degradation during the imaging process, it is still a challenging problem to detect and recognize license plates in low quality video images. In this paper, we focus on efficient deep representation learning for license plate recognition, detection and tracking. For license plate recognition, we mainly investigate the configuration of different network structures. We design a novel backbone network structure called SACNN, which combines convolutional neural network (CNN) and self-attention mechanism to learn non-linear representations for the structural patterns of characters in low quality video images. The proposed license plate recognition model employs the SACNN backbone network, a Long Short-Term Memory (LSTM) encoder and a Transformer decoder. For license plate detection, a Transformer encoder-decoder based method is adopted. To tackle the variations in license plate appearances and perspectives, an image rectification method is incorporated by using a spatial transformer network. For license plate tracking, a multi-object tracking method is incorporated by using Kalman filtering and temporal matching to associate detected license plates in video frames. Experiments are mainly carried out on the public large-scale video-based license plate dataset (LSV-LP) to validate the proposed methods.

引用

页数：14

共 46 条

[1] Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[2] Vision Transformer for Fast and Efficient Scene Text Recognition
Atienza, Rowel
[J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334
[3] Ba J.L., 2016, arXiv
[4] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[5] Scene Text Recognition with Permuted Autoregressive Sequence Models
Bautista, Darwin
Atienza, Rowel
[J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 178 - 196
[6] Björklund T, 2017, IEEE INT WORKSH MULT
[7] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[8] Focusing Attention: Towards Accurate Text Recognition in Natural Images
Cheng, Zhanzhan
Bai, Fan
Xu, Yunlu
Zheng, Gang
Pu, Shiliang
Zhou, Shuigeng
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
[9] Dong M., 2017, BMVC
[10] github, Synthetic Chinese License Plate

← 1 2 3 4 5 →