MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection

被引:1
作者
Liu, Zhandong [1 ]
Zhou, Wengang [1 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Dept Elect Engn & Informat Sci, 443 Huangshan Rd, Hefei 230027, Peoples R China
关键词
Scene Text Detection; Multi-level Feature; Feature Fusion; Instance Segmentation;
D O I
10.1145/3440087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, many scene text detection algorithms have achieved impressive performance by using convolutional neural networks. However, most of them do not make full use of the context among the hierarchical multi-level features to improve the performance of scene text detection. In this article, we present an efficient multi-level features enhanced cumulative framework based on instance segmentation for scene text detection. At first, we adopt a Multi-Level Features Enhanced Cumulative (MFEC) module to capture features of cumulative enhancement of representational ability. Then, a Multi-Level Features Fusion (MFF) module is designed to fully integrate both high-level and low-level MFEC features, which can adaptively encode scene text information. To verify the effectiveness of the proposed method, we perform experiments on six public datasets (namely, CTW1500, Total-text, MSRA-TD500, ICDAR2013, ICDAR2015, and MLT2017), and make comparisons with other state-of-the-art methods. Experimental results demonstrate that the proposed Multi-Level Features Enhanced Cumulative Network (MFECN) detector can well handle scene text instances with irregular shapes (i.e., curved, oriented, and horizontal) and achieves better or comparable results.
引用
收藏
页数:22
相关论文
共 77 条
  • [1] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
    Ch'ng, Chee Kheng
    Chan, Chee Seng
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
  • [2] Deep Cross-Modal Audio-Visual Generation
    Chen, Lele
    Srivastava, Sudhanshu
    Duan, Zhiyao
    Xu, Chenliang
    [J]. PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 349 - 357
  • [3] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [4] Structure-Aware Deep Learning for Product Image Classification
    Chen, Zhineng
    Al, Shanshan
    Jia, Caiyan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [5] Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
    Cornia, Marcella
    Baraldi, Lorenzo
    Serra, Giuseppe
    Cucchiara, Rita
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)
  • [6] Dai YC, 2018, INT C PATT RECOG, P3604, DOI 10.1109/ICPR.2018.8546066
  • [7] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
  • [8] Ding, 2019, ARXIV PREPRINT ARXIV
  • [9] Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI DOI 10.1007/s11263-009-0275-4
  • [10] Detect-and-Track:Efficient Pose Estimation in Videos
    Girdhar, Rohit
    Gkioxari, Georgia
    Torresani, Lorenzo
    Paluri, Manohar
    Tran, Du
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 350 - 359