RMFPN: End-to-End Scene Text Recognition Using Multi-Feature Pyramid Network

被引:2
|
作者
Mahadshetti, Ruturaj [1 ]
Lee, Guee-Sang [1 ]
Choi, Deok-Jai [1 ]
机构
[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, Gwangju 61186, South Korea
基金
新加坡国家研究基金会;
关键词
Text recognition; Semantics; Feature extraction; Visualization; Task analysis; Linguistics; Image recognition; Deep learning; Convolutional neural networks; Scene text recognition; deep learning; convolutional neural network; transformer; multi-feature pyramid network;
D O I
10.1109/ACCESS.2023.3280547
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene text recognition (STR) plays an important role in various computer vision activities. STR has been a desirable research topic in the computer community, and deep learning-based STR methods have gained tremendous outcomes over the past few years. Earlier state-of-the-art scene text recognition approaches even deliver a notable quantity of inaccurate yields when applied to images caught in real-world environments. Because these images lose precise text content information, previous methods generate less robust features and semantic information about text content. To address this issue, we propose a new approach called Residual Multi-Feature Pyramid Network(RMFPN), which integrates ResNet and Multi-Feature Pyramid Networks to grab multi-level relations, enrich the functionality, and generalization of the feature extractor. We build RMFPN with two convolutional pyramids as a feature extractor, which improves the robustness of features and semantic information to endure scene text recognition of various scales. Comprehensive experiments on diverse datasets demonstrate that our proposed method can acquire significant performance accuracy. The proposed RMFPN acquires a 0.61%, 1.2%, 1%, and 0.2% improvement on SVT, IC15, SVTP, and CUTE datasets.
引用
收藏
页码:61892 / 61900
页数:9
相关论文
共 50 条
  • [1] Feature Fusion Pyramid Network for End-to-End Scene Text Detection
    Wu, Yirui
    Zhang, Lilai
    Li, Hao
    Zhang, Yunfei
    Wan, Shaohua
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (11)
  • [2] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [3] An End-to-End Scene Text Recognition for Bilingual Text
    Albalawi, Bayan M.
    Jamal, Amani T.
    Al Khuzayem, Lama A.
    Alsaedi, Olaa A.
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
  • [4] An end-to-end model for multi-view scene text recognition
    Banerjee, Ayan
    Shivakumara, Palaiahnakote
    Bhattacharya, Saumik
    Pal, Umapada
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2024, 149
  • [5] End-to-end Scene Text Recognition in Videos Based on Multi Frame Tracking
    Wang, Xiaobing
    Jiang, Yingying
    Yang, Shuli
    Zhu, Xiangyu
    Li, Wei
    Fu, Pei
    Wang, Hua
    Luo, Zhenbo
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1255 - 1260
  • [6] End-to-end scene text recognition using tree-structured models
    Shi, Cunzhao
    Wang, Chunheng
    Xiao, Baihua
    Gao, Song
    Hu, Jinlong
    PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
  • [7] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [8] End-to-End Scene Text Recognition with Character Centroid Prediction
    Zhao, Wei
    Ma, Jinwen
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 291 - 299
  • [9] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
    Hao, Jiedong
    Wen, Yafei
    Deng, Jie
    Gan, Jun
    Ren, Shuai
    Tan, Hui
    Chen, Xiaoxin
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
  • [10] Person Re-identification with End-to-End Scene Text Recognition
    Kamlesh
    Xu, Pei
    Yang, Yang
    Xu, Yongchao
    COMPUTER VISION, PT III, 2017, 773 : 363 - 374