Scene text detection using structured information and an end-to-end trainable generative adversarial networks

被引:2
|
作者
Naveen, Palanichamy [1 ]
Hassaballah, Mahmoud [2 ,3 ]
机构
[1] KPR Inst Engn & Technol, Dept Elect & Elect Engn, Coimbatore, India
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, AlKharj 16278, Saudi Arabia
[3] South Valley Univ, Fac Comp & Informat, Dept Comp Sci, Qena, Egypt
关键词
Text detection; Generative adversarial network; Variational autoencoder; GAN loss; VAE loss;
D O I
10.1007/s10044-024-01259-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [22] Missing value imputation in multivariate time series with end-to-end generative adversarial networks
    Zhang, Ying
    Zhou, Baohang
    Cai, Xiangrui
    Guo, Wenya
    Ding, Xiaoke
    Yuan, Xiaojie
    Information Sciences, 2021, 551 : 67 - 82
  • [23] An End-to-End Scene Text Detector with Dynamic Attention
    Lin, Jingyu
    Yan, Yan
    Wang, Hanzi
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [24] An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
    Shi, Baoguang
    Bai, Xiang
    Yao, Cong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) : 2298 - 2304
  • [25] An end-to-end text spotter with text relation networks
    Jianguo Jiang
    Baole Wei
    Min Yu
    Gang Li
    Boquan Li
    Chao Liu
    Min Li
    Weiqing Huang
    Cybersecurity, 4
  • [26] An end-to-end text spotter with text relation networks
    Jiang, Jianguo
    Wei, Baole
    Yu, Min
    Li, Gang
    Li, Boquan
    Liu, Chao
    Li, Min
    Huang, Weiqing
    CYBERSECURITY, 2021, 4 (01)
  • [27] Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
    Li, Lujun
    Wudamu
    Kuerzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [28] TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network
    Sun, Yipeng
    Zhang, Chengquan
    Huang, Zuming
    Liu, Jiaming
    Han, Junyu
    Ding, Errui
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 83 - 99
  • [29] SIR-Net: Scene-Independent End-to-End Trainable Visual Relocalizer
    Nakashima, Ryo
    Seki, Akihito
    2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 472 - 481
  • [30] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695