Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

被引:64
作者
Afzal, Muhammad Zeshan [1 ,3 ]
Koelsch, Andreas [1 ,3 ]
Ahmed, Sheraz [2 ]
Liwicki, Marcus [1 ,3 ,4 ]
机构
[1] Univ Kaiserslautern, MindGarage, Kaiserslautern, Germany
[2] DFKI, Kaiserslautern, Germany
[3] Insiders Technol GmbH, Kaiserslautern, Germany
[4] Univ Fribourg, Fribourg, Switzerland
来源
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1 | 2017年
关键词
Document Image Classification; Deep CNN; Convolutional Neural Network; Transfer Learning;
D O I
10.1109/ICDAR.2017.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half. Existing approaches, such as the DeepDoc-Classifier, apply standard Convolutional Network architectures with transfer learning from the object recognition domain. The contribution of the paper is threefold: First, it investigates recently introduced very deep neural network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from real images). Second, it proposes transfer learning from a huge set of document images, i.e. 400, 000 documents. Third, it analyzes the impact of the amount of training data (document images) and other parameters to the classification abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while earlier approaches reach only 77.6%. Thus, a relative error reduction of more than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is achieved, corresponding to a relative error reduction of 11.5%.
引用
收藏
页码:883 / 888
页数:6
相关论文
共 30 条
[1]  
Afzal MZ, 2015, PROC INT CONF DOC, P1111, DOI 10.1109/ICDAR.2015.7333933
[2]  
AFZAL MZ, 2015, P 3 INT WORKSH HIST, P79, DOI DOI 10.1145/2809544.280956
[3]  
Ahmad R, 2015, PROC INT CONF DOC, P1101, DOI 10.1109/ICDAR.2015.7333931
[4]  
[Anonymous], PROC CVPR IEEE
[5]  
[Anonymous], CLUSTERING BASED ALG
[6]  
[Anonymous], ARXIV170200177
[7]  
[Anonymous], ICPR
[8]  
[Anonymous], 2014, P 2 INT C LEARNING R
[9]  
[Anonymous], ARXIV160501189
[10]   Fine-grained document genre classification using first order random graphs [J].
Bagdanov, AD ;
Worring, M .
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, :79-83