Visual and Textual Deep Feature Fusion for Document Image Classification

被引:29
|
作者
Bakkali, Souhail [1 ]
Ming, Zuheng [1 ]
Coustaty, Mickael [1 ]
Rusinol, Marcal [2 ]
机构
[1] Univ La Rochelle, L3i, La Rochelle, France
[2] Univ Autonoma Barcelona, CVC, Barcelona, Spain
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年
关键词
D O I
10.1109/CVPRW50498.2020.00289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic of text document image classification has been explored extensively over the past few years. Most recent approaches handled this task by jointly learning the visual features of document images and their corresponding textual contents. Due to the various structures of document images, the extraction of semantic information from its textual content is beneficial for document image processing tasks such as document retrieval, information extraction, and text classification. In this work, a two-stream neural architecture is proposed to perform the document image classification task. We conduct an exhaustive investigation of nowadays widely used neural networks as well as word embedding procedures used as backbones, in order to extract both visual and textual features from document images. Moreover, a joint feature learning approach that combines image features and text embeddings is introduced as a late fusion methodology. Both the theoretical analysis and the experimental results demonstrate the superiority of our proposed joint feature learning method comparatively to the single modalities. This joint learning approach outperforms the state-of-the-art results with a classification accuracy of 97.05% on the large-scale RVL-CDIP dataset.
引用
收藏
页码:2394 / 2403
页数:10
相关论文
共 50 条
  • [1] Multi-Class Document Image Classification using Deep Visual and Textual Features
    Sevim, Semih
    Ekinci, Ekin
    Omurca, Sevinc Ilhan
    Edinc, Eren Berk
    Eken, Suleyman
    Erdem, Turkucan
    Sayar, Ahmet
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2022, 21 (02)
  • [2] VISUAL AND TEXTUAL FEATURE FUSION FOR AUTOMATIC CUSTOMS TARIFF CLASSIFICATION
    Turhan, Bilgehan
    Akar, Gozde B.
    Turhan, Cigdem
    Yuksel, Cihan
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 76 - 81
  • [3] Deep Multiple Feature Fusion for Hyperspectral Image Classification
    Cao, Xianghai
    Li, Renjie
    Wen, Li
    Feng, Jie
    Jiao, Licheng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2018, 11 (10) : 3880 - 3891
  • [4] Hyperspectral Image Classification With Deep Feature Fusion Network
    Song, Weiwei
    Li, Shutao
    Fang, Leyuan
    Lu, Ting
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (06): : 3173 - 3184
  • [5] Improving classification of an industrial document image database by combining visual and textual features
    Augereau, Olivier
    Journet, Nicholas
    Vialard, Anne
    Domenger, Jean-Philippe
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 314 - 318
  • [6] Deep Belief Networks for Feature Fusion in Hyperspectral Image Classification
    Ghassemi, Mohammad
    Ghassemian, Hassan
    Imani, Maryam
    PROCEEDINGSS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON AEROSPACE ELECTRONICS AND REMOTE SENSING TECHNOLOGY (ICARES 2018), 2018,
  • [7] A decisive content based image retrieval approach for feature fusion in visual and textual images
    Unar, Salahuddin
    Wang, Xingyuan
    Wang, Chunpeng
    Wang, Yu
    KNOWLEDGE-BASED SYSTEMS, 2019, 179 : 8 - 20
  • [8] Visual-Textual Late Semantic Fusion Using Deep Neural Network for Document Categorization
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 662 - 670
  • [9] Visual feature coding based on heterogeneous structure fusion for image classification
    Lin, Guangfeng
    Fan, Caixia
    Zhu, Hong
    Miu, Yalin
    Kang, Xiaobing
    INFORMATION FUSION, 2017, 36 : 275 - 283
  • [10] Deep Feature Extraction and Feature Fusion for Bi-Temporal Satellite Image Classification
    Asokan, Anju
    Anitha, J.
    Patrut, Bogdan
    Danciulescu, Dana
    Hemanth, D. Jude
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 373 - 388