MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection

被引：1

作者：

Liu, Zhandong ^{[1
]}

Zhou, Wengang ^{[1
]}

Li, Houqiang ^{[1
]}

机构：

[1] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Dept Elect Engn & Informat Sci, 443 Huangshan Rd, Hefei 230027, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2021年 / 17卷 / 03期

关键词：

Scene Text Detection; Multi-level Feature; Feature Fusion; Instance Segmentation;

D O I：

10.1145/3440087

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, many scene text detection algorithms have achieved impressive performance by using convolutional neural networks. However, most of them do not make full use of the context among the hierarchical multi-level features to improve the performance of scene text detection. In this article, we present an efficient multi-level features enhanced cumulative framework based on instance segmentation for scene text detection. At first, we adopt a Multi-Level Features Enhanced Cumulative (MFEC) module to capture features of cumulative enhancement of representational ability. Then, a Multi-Level Features Fusion (MFF) module is designed to fully integrate both high-level and low-level MFEC features, which can adaptively encode scene text information. To verify the effectiveness of the proposed method, we perform experiments on six public datasets (namely, CTW1500, Total-text, MSRA-TD500, ICDAR2013, ICDAR2015, and MLT2017), and make comparisons with other state-of-the-art methods. Experimental results demonstrate that the proposed Multi-Level Features Enhanced Cumulative Network (MFECN) detector can well handle scene text instances with irregular shapes (i.e., curved, oriented, and horizontal) and achieves better or comparable results.

引用

页数：22

共 77 条

[1] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Ch'ng, Chee Kheng
Chan, Chee Seng
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
[2] Deep Cross-Modal Audio-Visual Generation
Chen, Lele
Srivastava, Sudhanshu
Duan, Zhiyao
Xu, Chenliang
[J]. PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 349 - 357
[3] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[4] Structure-Aware Deep Learning for Product Image Classification
Chen, Zhineng
Al, Shanshan
Jia, Caiyan
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
[5] Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
Cornia, Marcella
Baraldi, Lorenzo
Serra, Giuseppe
Cucchiara, Rita
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)
[6] Dai YC, 2018, INT C PATT RECOG, P3604, DOI 10.1109/ICPR.2018.8546066
[7] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
[8] Ding, 2019, ARXIV PREPRINT ARXIV
[9] Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI DOI 10.1007/s11263-009-0275-4
[10] Detect-and-Track:Efficient Pose Estimation in Videos
Girdhar, Rohit
Gkioxari, Georgia
Torresani, Lorenzo
Paluri, Manohar
Tran, Du
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 350 - 359

← 1 2 3 4 5 6 7 8 →