End-to-end dilated convolution network for document image semantic segmentation基于膨胀卷积网络的端到端文档语义分割

被引:0
作者
Can-hui Xu
Cao Shi
Yi-nong Chen
机构
[1] Qingdao University of Science and Technology,School of Information Sciences and Technology
[2] Arizona State University,School of Computing, Informatics and Decision Systems Engineering
来源
Journal of Central South University | 2021年 / 28卷
关键词
semantic segmentation; document images; deep learning; NVIDIA jetson nano; 语义分割; 文档图像; 深度学习; 英伟达 Jetson Nano;
D O I
暂无
中图分类号
学科分类号
摘要
Semantic segmentation is a crucial step for document understanding. In this paper, an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming. To extract semantic structures from document images, we present an end-to-end dilated convolution network architecture. Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution. Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels. The convolution part works as feature extractor to obtain multidimensional and hierarchical image features. The consecutive deconvolution is used for producing full resolution segmentation prediction. The probability of each pixel decides its predefined semantic class label. To understand segmentation granularity, we compare performances at three different levels. From fine grained class to coarse class levels, the proposed dilated convolution network architecture is evaluated on three document datasets. The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances. The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.
引用
收藏
页码:1765 / 1774
页数:9
相关论文
共 42 条
  • [1] Chen L C(2018)DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J] IEEE Transactions on Pattern Analysis and Machine Intelligence 40 834-848
  • [2] Papandreou G(2017)Fully convolutional networks for semantic segmentation [J] IEEE Transactions on Pattern Analysis and Machine Intelligence 39 640-651
  • [3] Kokkinos I(1995)Page segmentation and classification utilising a bottom-up approach [J] Proceedings of 3rd International Conference on Document Analysis and Recognition 2 610-614
  • [4] Murphy K(1997)A fast algorithm for bottom-up document layout analysis [J] IEEE Transactions on Pattern Analysis and Machine Intelligence 19 273-277
  • [5] Yuille A L(1998)Segmentation of page images using the area voronoi diagram [J] Computer Vision and Image Understanding 70 370-382
  • [6] Shelhamer E(2014)Mathematical formula identification and performance evaluation in PDF documents [J] International Journal on Document Analysis and Recognition (IJDAR) 17 239-255
  • [7] Long J(2013)Graph-based layout analysis for PDF documents [C] Proc SPIE 8664, Imaging and Printing in a Web 2 0 World IV 8664 866407-1375
  • [8] Darrell T(2013)Graphic composite segmentation for PDF documents with complex layouts [C] Document Recognition and Retrieval XX 8658 86580E-968
  • [9] Drivas D(2014)Contextual modeling for logical labeling of PDF documents [J] Computers & Electrical Engineering 40 1363-289
  • [10] Amin A(2014)Contextual text/non-text stroke classification in online handwritten notes with conditional random fields [J] Pattern Recognition 47 959-848