Scene text detection using structured information and an end-to-end trainable generative adversarial networks

被引:2
|
作者
Naveen, Palanichamy [1 ]
Hassaballah, Mahmoud [2 ,3 ]
机构
[1] KPR Inst Engn & Technol, Dept Elect & Elect Engn, Coimbatore, India
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, AlKharj 16278, Saudi Arabia
[3] South Valley Univ, Fac Comp & Informat, Dept Comp Sci, Qena, Egypt
关键词
Text detection; Generative adversarial network; Variational autoencoder; GAN loss; VAE loss;
D O I
10.1007/s10044-024-01259-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] End-to-End Learning for Structured Prediction Energy Networks
    Belanger, David
    Yang, Bishan
    McCallum, Andrew
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [42] An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
    Li, Zhenyu
    Zhou, Aiguo
    Shen, Yong
    SENSORS, 2020, 20 (06)
  • [43] Person Re-identification with End-to-End Scene Text Recognition
    Kamlesh
    Xu, Pei
    Yang, Yang
    Xu, Yongchao
    COMPUTER VISION, PT III, 2017, 773 : 363 - 374
  • [44] An end-to-end model for multi-view scene text recognition
    Banerjee, Ayan
    Shivakumara, Palaiahnakote
    Bhattacharya, Saumik
    Pal, Umapada
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2024, 149
  • [45] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
    Makhmudov, Fazliddin
    Mukhiddinov, Mukhriddin
    Abdusalomov, Akmalbek
    Avazov, Kuldoshbay
    Khamdamov, Utkir
    Cho, Young Im
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [46] End-to-End Text Recognition with Convolutional Neural Networks
    Wang, Tao
    Wu, David J.
    Coates, Adam
    Ng, Andrew Y.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3304 - 3308
  • [47] Generative Adversarial Imitation Learning for End-to-End Autonomous Driving on Urban Environments
    Karl Couto, Gustavo Claudio
    Antonelo, Eric Aislan
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [48] An End-to-end Generative Adversarial Network for Crowd Counting under Complicated Scenes
    Li, Jiawen
    Yang, Hua
    Chen, Lin
    Li, Jingwei
    Zhi, Cheng
    2017 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2017, : 93 - 96
  • [49] Perception-guided generative adversarial network for end-to-end speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    APPLIED SOFT COMPUTING, 2022, 128
  • [50] AXNet: ApproXimate computing using an end-to-end trainable neural network
    Peng, Zhenghao
    Chen, Xuyang
    Xu, Chengwen
    Jing, Naifeng
    Liang, Xiaoyao
    Lu, Cewu
    Jiang, Li
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS, 2018,