SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

被引:0
|
作者
Bartz, Christian [1 ]
Yang, Haojin [1 ]
Meinel, Christoph [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-to-end. Most existing works consist of multiple deep neural networks and several pre-processing steps. In contrast to this, we propose to use a single deep neural network, that learns to detect and recognize text from natural images, in a semi-supervised way. SEE is a network that integrates and jointly learns a spatial transformer network, which can learn to detect text regions in an image, and a text recognition network that takes the identified text regions and recognizes their textual content. We introduce the idea behind our novel approach and show its feasibility, by performing a range of experiments on standard benchmark datasets, where we achieve competitive results.
引用
收藏
页码:6674 / 6681
页数:8
相关论文
共 50 条
  • [1] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
  • [2] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
  • [3] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [4] Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
    Sadeq, Nafis
    Chowdhury, Nafis Tahmid
    Utshaw, Farhan Tanvir
    Ahmed, Shafayat
    Adnan, Muhammad Abdullah
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1875 - 1883
  • [5] Semi-Supervised Scene Text Recognition
    Gao, Yunze
    Chen, Yingying
    Wang, Jinqiao
    Lu, Hanqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3005 - 3016
  • [6] An End-to-End Scene Text Recognition for Bilingual Text
    Albalawi, Bayan M.
    Jamal, Amani T.
    Al Khuzayem, Lama A.
    Alsaedi, Olaa A.
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
  • [7] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
    Hsu, Wei-Ning
    Lee, Ann
    Synnaeve, Gabriel
    Hannun, Awni
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
  • [8] Towards Precise End-to-end Semi-Supervised Human Head Detection Network
    Li, Rongchun
    Zhang, Junjie
    Liu, Yuntao
    Dou, Yong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
    Dey, Subhadeep
    Motlicek, Petr
    Bui, Trung
    Dernoncourt, Franck
    INTERSPEECH 2019, 2019, : 734 - 738
  • [10] Towards End-to-End Semi-supervised Table Detection with Semantic Aligned Matching Transformer
    Shehzadi, Tahira
    Sarode, Shalini
    Stricker, Didier
    Afzal, Muhammad Zeshan
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 295 - 318