Scene text detection using structured information and an end-to-end trainable generative adversarial networks

被引:2
|
作者
Naveen, Palanichamy [1 ]
Hassaballah, Mahmoud [2 ,3 ]
机构
[1] KPR Inst Engn & Technol, Dept Elect & Elect Engn, Coimbatore, India
[2] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, AlKharj 16278, Saudi Arabia
[3] South Valley Univ, Fac Comp & Informat, Dept Comp Sci, Qena, Egypt
关键词
Text detection; Generative adversarial network; Variational autoencoder; GAN loss; VAE loss;
D O I
10.1007/s10044-024-01259-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.
引用
收藏
页数:17
相关论文
共 50 条
  • [11] Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment
    Bagi, Randheer
    Dutta, Tanima
    Gupta, Hari Prabhat
    IEEE ACCESS, 2020, 8 : 111433 - 111447
  • [12] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
    Hao, Jiedong
    Wen, Yafei
    Deng, Jie
    Gan, Jun
    Ren, Shuai
    Tan, Hui
    Chen, Xiaoxin
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
  • [13] Towards End-to-End Unified Scene Text Detection and Layout Analysis
    Long, Shangbang
    Qin, Siyang
    Panteleev, Dmitry
    Bissacco, Alessandro
    Fujii, Yasuhisa
    Raptis, Michalis
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1039 - 1049
  • [14] End-to-End Analysis for Text Detection and Recognition in Natural Scene Images
    Alnefaie, Ahlam
    Gupta, Deepak
    Bhuyan, Monowar H.
    Razzak, Imran
    Gupta, Prashant
    Prasad, Mukesh
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [15] Feature Fusion Pyramid Network for End-to-End Scene Text Detection
    Wu, Yirui
    Zhang, Lilai
    Li, Hao
    Zhang, Yunfei
    Wan, Shaohua
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (11)
  • [16] End-to-End Adversarial Learning for Intrusion Detection in Computer Networks
    Mohammadi, Bahram
    Sabokrou, Mohammad
    PROCEEDINGS OF THE IEEE LCN: 2019 44TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2019), 2019, : 270 - 273
  • [17] Scene text spotting based on end-to-end
    Wei G.
    Rong W.
    Liang Y.
    Xiao X.
    Liu X.
    Journal of Intelligent and Fuzzy Systems, 2021, 40 (05): : 8871 - 8881
  • [18] FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition
    Seong, Hongje
    Hyun, Junhyuk
    Kim, Euntai
    IEEE ACCESS, 2020, 8 (08) : 82066 - 82077
  • [19] A Text Detection and Recognition System based on an End-to-End Trainable Framework from UAV Imagery
    Wu, Qingtian
    Zhou, Yimin
    Liang, Guoyuan
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 736 - 741
  • [20] Missing value imputation in multivariate time series with end-to-end generative adversarial networks
    Zhang, Ying
    Zhou, Baohang
    Cai, Xiangrui
    Guo, Wenya
    Ding, Xiaoke
    Yuan, Xiaojie
    INFORMATION SCIENCES, 2021, 551 : 67 - 82