Analyzing the potential of active learning for document image classification

被引:0
|
作者
Saifullah Saifullah
Stefan Agne
Andreas Dengel
Sheraz Ahmed
机构
[1] German Research Center for Artificial Intelligence,
[2] RPTU Kaiserslautern-Landau,undefined
[3] DeepReader GmbH,undefined
来源
International Journal on Document Analysis and Recognition (IJDAR) | 2023年 / 26卷
关键词
Document image classification; Document analysis; Active learning; Deep active learning;
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning has been extensively researched in the field of document analysis and has shown excellent performance across a wide range of document-related tasks. As a result, a great deal of emphasis is now being placed on its practical deployment and integration into modern industrial document processing pipelines. It is well known, however, that deep learning models are data-hungry and often require huge volumes of annotated data in order to achieve competitive performances. And since data annotation is a costly and labor-intensive process, it remains one of the major hurdles to their practical deployment. This study investigates the possibility of using active learning to reduce the costs of data annotation in the context of document image classification, which is one of the core components of modern document processing pipelines. The results of this study demonstrate that by utilizing active learning (AL), deep document classification models can achieve competitive performances to the models trained on fully annotated datasets and, in some cases, even surpass them by annotating only 15–40% of the total training dataset. Furthermore, this study demonstrates that modern AL strategies significantly outperform random querying, and in many cases achieve comparable performance to the models trained on fully annotated datasets even in the presence of practical deployment issues such as data imbalance, and annotation noise, and thus, offer tremendous benefits in real-world deployment of deep document classification models. The code to reproduce our experiments is publicly available at https://github.com/saifullah3396/doc_al.
引用
收藏
页码:187 / 209
页数:22
相关论文
共 50 条
  • [31] A Novel Active Learning Algorithm for Robust Image Classification
    Xiong, Xingliang
    Fan, Mingyu
    Yu, Chuang
    Hong, Zhenjie
    IEEE ACCESS, 2020, 8 : 71106 - 71116
  • [32] A Two -Stage Active Learning Method for Image Classification
    Wang, Feiyue
    Li, Xu
    Zhang, Yifan
    Wei, Baoguo
    Li, Lixin
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1134 - 1139
  • [33] COMBINING ACTIVE AND METRIC LEARNING FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Pasolli, Edoardo
    Yang, Hsiuhan Lexie
    Crawford, Melba M.
    2014 6TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2014,
  • [34] Integrating Multiple Information of Active Learning for Image Classification
    Xu, Haihui
    Zhao, Pengpeng
    Wu, Jian
    Cui, Zhiming
    Li, Chengchao
    2013 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2013, : 374 - 379
  • [35] DEFENDING ACTIVE LEARNING AGAINST ADVERSARIAL INPUTS IN AUTOMATED DOCUMENT CLASSIFICATION
    Pi, Lei
    Lu, Zhuo
    Sagduyu, Yalin
    Chen, Su
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 257 - 261
  • [36] A document image classification system fusing deep and machine learning models
    Sevinç İlhan Omurca
    Ekin Ekinci
    Semih Sevim
    Eren Berk Edinç
    Süleyman Eken
    Ahmet Sayar
    Applied Intelligence, 2023, 53 : 15295 - 15310
  • [37] A document image classification system fusing deep and machine learning models
    Omurca, Sevinc Ilhan
    Ekinci, Ekin
    Sevim, Semih
    Edinc, Eren Berk
    Eken, Suleyman
    Sayar, Ahmet
    APPLIED INTELLIGENCE, 2023, 53 (12) : 15295 - 15310
  • [38] Hyperspectral image classification via active learning and broad learning system
    Huifang Huang
    Zhi Liu
    C. L. Philip Chen
    Yun Zhang
    Applied Intelligence, 2023, 53 : 15683 - 15694
  • [39] Active Learning for Visual Image Classification Method Based on Transfer Learning
    Yang, Jihai
    Li, Shijun
    Xu, Wenning
    IEEE ACCESS, 2018, 6 : 187 - 198
  • [40] Hyperspectral image classification via active learning and broad learning system
    Huang, Huifang
    Liu, Zhi
    Chen, C. L. Philip
    Zhang, Yun
    APPLIED INTELLIGENCE, 2023, 53 (12) : 15683 - 15694