Towards open-set text recognition via label-to-prototype learning

被引:32
作者
Liu, Chang [1 ]
Yang, Chun [1 ]
Qin, Hai-Bo [1 ]
Zhu, Xiaobin [1 ]
Liu, Cheng-Lin [2 ]
Yin, Xu-Cheng [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Dept Comp Sci & Technol, Beijing 100083, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Open-set recognition; Scene text recognition; Low-shot recognition; NETWORK; CLASSIFICATION;
D O I
10.1016/j.patcog.2022.109109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition is a popular research topic which is also extensively utilized in the industry. Al-though many methods have achieved satisfactory performance for the close-set text recognition chal-lenges, these methods lose feasibility in open-set scenarios, where collecting data or retraining models for novel characters could yield a high cost. For example, annotating samples for foreign languages can be expensive, whereas retraining the model each time when a "novel" character is discovered from historical documents costs both time and resources. In this paper, we introduce and formulate a new open-set text recognition task which demands the capability to spot and recognize novel characters without retrain-ing. A label-to-prototype learning framework is also proposed as a baseline for the new task. Specifically, the framework introduces a generalizable label-to-prototype mapping function to build prototypes (class centers) for both seen and unseen classes. An open-set predictor is then utilized to recognize or reject samples according to the prototypes. The implementation of rejection capability over out-of-set charac-ters allows automatic spotting of unknown characters in the incoming data stream. Extensive experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 49 条
[1]  
Souibgui MA, 2022, Arxiv, DOI arXiv:2107.10064
[2]  
[Anonymous], 2014, P BRIT MACH VIS C, DOI DOI 10.5244/C.28.88
[3]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[4]   Rosetta: Large Scale System for Text Detection and Recognition in Images [J].
Borisyuk, Fedor ;
Gordo, Albert ;
Sivakumar, Viswanath .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :71-79
[5]   Zero-shot Handwritten Chinese Character Recognition with hierarchical decomposition embedding [J].
Cao, Zhong ;
Lu, Jiang ;
Cui, Sen ;
Zhang, Changshui .
PATTERN RECOGNITION, 2020, 107
[6]  
Chee Kheng Chng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1571, DOI 10.1109/ICDAR.2019.00252
[7]  
Chen JY, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P615
[8]   AON: Towards Arbitrarily-Oriented Text Recognition [J].
Cheng, Zhanzhan ;
Xu, Yangliu ;
Bai, Fan ;
Niu, Yi ;
Pu, Shiliang ;
Zhou, Shuigeng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5571-5579
[9]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[10]  
Fei G., 2016, P 2016 C N AM CHAPT, P506, DOI DOI 10.18653/V1/N16-1061