Scene text detection using structured information and an end-to-end trainable generative adversarial networks

被引:2
|
作者
Naveen, Palanichamy [1 ]
Hassaballah, Mahmoud [2 ,3 ]
机构
[1] KPR Inst Engn & Technol, Dept Elect & Elect Engn, Coimbatore, India
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, AlKharj 16278, Saudi Arabia
[3] South Valley Univ, Fac Comp & Informat, Dept Comp Sci, Qena, Egypt
关键词
Text detection; Generative adversarial network; Variational autoencoder; GAN loss; VAE loss;
D O I
10.1007/s10044-024-01259-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
    Busta, Michal
    Neumann, Lukas
    Matas, Jiri
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2223 - 2231
  • [2] End-to-end scene text recognition using tree-structured models
    Shi, Cunzhao
    Wang, Chunheng
    Xiao, Baihua
    Gao, Song
    Hu, Jinlong
    PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
  • [3] End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks
    Xue, Alice
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3862 - 3870
  • [4] End-to-End Video-to-Speech Synthesis Using Generative Adversarial Networks
    Mira, Rodrigo
    Vougioukas, Konstantinos
    Ma, Pingchuan
    Petridis, Stavros
    Schuller, Bjoern W.
    Pantic, Maja
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (06) : 3454 - 3466
  • [5] LithoGAN: End-to-End Lithography Modeling with Generative Adversarial Networks
    Ye, Wei
    Alawieh, Mohamed Baker
    Lin, Yibo
    Pan, David Z.
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [6] Perceptual Conditional Generative Adversarial Networks for End-to-End Image Colourization
    Halder, Shirsendu Sukanta
    De, Kanjar
    Roy, Partha Pratim
    COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 269 - 283
  • [7] FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks
    Yazdanbakhsh, Amir
    Brzozowski, Michael
    Khaleghi, Behnam
    Ghodrati, Soroush
    Samadi, Kambiz
    Kim, Nam Sung
    Esmaeilzadeh, Hadi
    PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, : 65 - 72
  • [8] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [9] End-PolarT: Polar Representation for End-to-End Scene Text Detection
    Wu, Yirui
    Kong, Qiran
    Qian, Cheng
    Nappi, Michele
    Wan, Shaohua
    BIG DATA RESEARCH, 2023, 34
  • [10] An End-to-End Scene Text Recognition for Bilingual Text
    Albalawi, Bayan M.
    Jamal, Amani T.
    Al Khuzayem, Lama A.
    Alsaedi, Olaa A.
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)