Noisy-Aware Unsupervised Domain Adaptation for Scene Text Recognition

被引：0

作者：

Liu, Xiao-Qian ^{[1
]}

Zhang, Peng-Fei ^{[2
]}

Luo, Xin ^{[1
]}

Huang, Zi ^{[2
]}

Xu, Xin-Shun ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China

[2] Univ Queensland, Sch Elect Engn & Comp Sci, Brisbane, Qld 4072, Australia

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

中国国家自然科学基金;

关键词：

Text recognition; domain adaptation; entropy; noisy-aware; consistency regularization; NETWORK;

D O I：

10.1109/TIP.2024.3492705

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unsupervised Domain Adaptation (UDA) has shown promise in Scene Text Recognition (STR) by facilitating knowledge transfer from labeled synthetic text (source) to more challenging unlabeled real scene text (target). However, existing UDA-based STR methods fully rely on the pseudo-labels of target samples, which ignores the impact of domain gaps (inter-domain noise) and various natural environments (intra-domain noise), resulting in poor pseudo-label quality. In this paper, we propose a novel noisy-aware unsupervised domain adaptation framework tailored for STR, which aims to enhance model robustness against both inter- and intra-domain noise, thereby providing more precise pseudo-labels for target samples. Concretely, we propose a reweighting target pseudo-labels by estimating the entropy of refined probability distributions, which mitigates the impact of domain gaps on pseudo-labels. Additionally, a decoupled triple-P-N consistency matching module is proposed, which leverages data augmentation to increase data diversity, enhancing model robustness in diverse natural environments. Within this module, we design a low-confidence-based character negative learning, which is decoupled from high-confidence-based positive learning, thus improving sample utilization under scarce target samples. Furthermore, we extend our framework to the more challenging Source-Free UDA (SFUDA) setting, where only a pre-trained source model is available for adaptation, with no access to source data. Experimental results on benchmark datasets demonstrate the effectiveness of our framework. Under the SFUDA setting, our method exhibits faster convergence and superior performance with less training data than previous UDA-based STR methods. Our method surpasses representative STR methods, establishing new state-of-the-art results across multiple datasets.

引用

页码：6550 / 6563

页数：14

共 72 条

[31] MASTER: Multi-aspect non-local network for scene text recognition [J].

Lu, Ning ;

Yu, Wenwen ;

Qi, Xianbiao ;

Chen, Yihao ;

Gong, Ping ;

Xiao, Rong ;

Bai, Xiang .

PATTERN RECOGNITION, 2021, 117

[32] ICDAR 2003 robust reading competitions: Entries, results, and future directions [J].

Lucas S.M. ;

Panaretos A. ;

Sosa L. ;

Tang A. ;

Wong S. ;

Young R. ;

Ashida K. ;

Nagai H. ;

Okamoto M. ;

Yamamoto H. ;

Miyao H. ;

Zhu J. ;

Ou W. ;

Wolf C. ;

Jolion J.-M. ;

Todoran L. ;

Worring M. ;

Lin X. .

International Journal of Document Analysis and Recognition (IJDAR), 2005, 7 (2-3) :105-122

[33] SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization [J].

Luo, Canjie ;

Jin, Lianwen ;

Chen, Jingdong .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1029-1038

[34] The IAM-database: An English sentence database for offline handwriting recognition [J].

U.-V. Marti ;

H. Bunke .

International Journal on Document Analysis and Recognition, 2002, 5 (1) :39-46

[35] Scene Text Recognition using Higher Order Language Priors [J].

Mishra, Anand ;

Alahari, Karteek ;

Jawahar, C. V. .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,

[36] Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features [J].

Na, Byeonghu ;

Kim, Yoonsik ;

Park, Sungrae .

COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 :446-463

[37] Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision [J].

Pan, Fei ;

Shin, Inkyu ;

Rameau, Francois ;

Lee, Seokju ;

Kweon, In So .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3763-3772

[38] Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised Text Recognition [J].

Patel, Gaurav ;

Allebach, Jan ;

Qiu, Qiang .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :6169-6179

[39] Uncertainty-Induced Transferability Representation for Source-Free Unsupervised Domain Adaptation [J].

Pei, Jiangbo ;

Jiang, Zhuqing ;

Men, Aidong ;

Chen, Liang ;

Liu, Yang ;

Chen, Qingchao .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 :2033-2048

[40] A robust arbitrary text detection system for natural scene images [J].

Risnumawan, Anhar ;

Shivakumara, Palaiahankote ;

Chan, Chee Seng ;

Tan, Chew Lim .

EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (18) :8027-8048

← 1 2 3 4 5 6 7 8 →