Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution

被引:11
作者
Qin, Xiao [1 ]
Jiang, Jianhui [1 ]
Yuan, Chang-An [1 ]
Qiao, Shaojie [2 ]
Fan, Wei [1 ]
机构
[1] Nanning Normal Univ, Sch Comp & Informat Engn, Nanning 530299, Peoples R China
[2] Chengdu Univ Informat Technol, Sch Software Engn, Chengdu 610225, Peoples R China
基金
中国国家自然科学基金;
关键词
Text detection; deep learning; soft attention mechanism; dilated convolutions; Jaccard coefficient;
D O I
10.1109/ACCESS.2020.3007351
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and highly varying orientated text is still a challenging problem. To tackle this problem, we propose a robust arbitrary shape text detection method called Soft Dilated network (SDnet). The proposed method has two essential steps: (1) feature extraction by backbone; (2) post-processing approach to generate elaborated polygons or boundaries. In particular, the backbone is based on soft attention mechanism and dilated convolution. The soft attention mechanism learns and obtains importance feature from each feature channel, and dilated convolution can effectively aggregate multi-scale contextual information without losing the resolution, and enhance the robust of the network model. The proposed method can accurately detect curve text and discriminate text and non-text areas in an efficient fashion. In addition, Jaccard coefficient is used as loss function to promote the post-processing capability of detecting sparse-arranged and arbitrary shape text. Based on the aforementioned technique, the proposed method an effectively handle the problem of sparse arranged arbitrary natural scene text detection. Experiments were conducted on three benchmark datasets: curved text dataset CTW1500, Total-Text and oriented dataset ICDAR2015, and the results show that when compared with the state-of-the-art text detection methods, the proposed method is more robust and it can find smaller text blocks in the image due to the Loss Function calculation with Jaccard coefficient. Furthermore, we performed multiple sets of ablation experiments, verify the effectiveness of the propose method.
引用
收藏
页码:122685 / 122694
页数:10
相关论文
共 43 条
[1]   Image Segmentation Using Minimum Cross-Entropy Thresholding [J].
Al-Ajlan, Amani ;
El-Zaart, Ali .
2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, :1776-+
[2]  
[Anonymous], 2017, P IEEE C COMPUT VIS, DOI [10.1109/CVPR.2017.75, DOI 10.48550/ARXIV.1705.09914]
[3]   Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].
Ch'ng, Chee Kheng ;
Chan, Chee Seng .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942
[4]  
Chen L.C., 2017, P INT C COMP VIS PAT
[5]  
Deng D., 2018, P AAAI C ART INT, V32, P1
[6]   Single Shot Text Detector with Regional Attention [J].
He, Pan ;
Huang, Weilin ;
He, Tong ;
Zhu, Qile ;
Qiao, Yu ;
Li, Xiaolin .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3066-3074
[7]   Deep Direct Regression for Multi-Oriented Scene Text Detection [J].
He, Wenhao ;
Zhang, Xu-Yao ;
Yin, Fei ;
Liu, Cheng-Lin .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :745-753
[8]   WordSup: Exploiting Word Annotations for Character based Text Detection [J].
Hu, Han ;
Zhang, Chengquan ;
Luo, Yuxuan ;
Wang, Yuzhuo ;
Han, Junyu ;
Ding, Errui .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4950-4959
[9]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[10]   A Constructive Hybrid Structure Optimization Methodology for Radial Basis Probabilistic Neural Networks [J].
Huang, De-Shuang ;
Du, Ji-Xiang .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (12) :2099-2115