Visual and Textual Deep Feature Fusion for Document Image Classification

被引:29
|
作者
Bakkali, Souhail [1 ]
Ming, Zuheng [1 ]
Coustaty, Mickael [1 ]
Rusinol, Marcal [2 ]
机构
[1] Univ La Rochelle, L3i, La Rochelle, France
[2] Univ Autonoma Barcelona, CVC, Barcelona, Spain
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年
关键词
D O I
10.1109/CVPRW50498.2020.00289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic of text document image classification has been explored extensively over the past few years. Most recent approaches handled this task by jointly learning the visual features of document images and their corresponding textual contents. Due to the various structures of document images, the extraction of semantic information from its textual content is beneficial for document image processing tasks such as document retrieval, information extraction, and text classification. In this work, a two-stream neural architecture is proposed to perform the document image classification task. We conduct an exhaustive investigation of nowadays widely used neural networks as well as word embedding procedures used as backbones, in order to extract both visual and textual features from document images. Moreover, a joint feature learning approach that combines image features and text embeddings is introduced as a late fusion methodology. Both the theoretical analysis and the experimental results demonstrate the superiority of our proposed joint feature learning method comparatively to the single modalities. This joint learning approach outperforms the state-of-the-art results with a classification accuracy of 97.05% on the large-scale RVL-CDIP dataset.
引用
收藏
页码:2394 / 2403
页数:10
相关论文
共 50 条
  • [21] Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification
    Salman Al-Tameemi I.K.
    Feizi-Derakhshi M.-R.
    Pashazadeh S.
    Asadpour M.
    Computers, Materials and Continua, 2023, 76 (02): : 2145 - 2177
  • [22] A Textual Backdoor Defense Method Based on Deep Feature Classification
    Shao, Kun
    Yang, Junan
    Hu, Pengjiang
    Li, Xiaoshuai
    ENTROPY, 2023, 25 (02)
  • [23] ROBUST VISUAL TRACKING WITH DEEP FEATURE FUSION
    Wang, Guokun
    Wang, Jingjing
    Tang, Wenyi
    Yu, Nenghai
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1917 - 1921
  • [24] A Shallow-to-Deep Feature Fusion Network for VHR Remote Sensing Image Classification
    Liu, Sicong
    Zheng, Yongjie
    Du, Qian
    Bruzzone, Lorenzo
    Samat, Alim
    Tong, Xiaohua
    Jin, Yanmin
    Wang, Chao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [25] A Feature Fusion Network for PolSAR Image Classification Based on Physical Features and Deep Features
    Hua, Wenqiang
    Hou, Qianjin
    Jin, Xiaomin
    Liu, Lin
    Sun, Nan
    Meng, Zhe
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [26] An efficient feature fusion in HSI image classification
    Srivastava, Vishal
    Biswas, Bhaskar
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2020, 31 (01) : 221 - 247
  • [27] Feature Fusion via Deep Residual Graph Convolutional Network for Hyperspectral Image Classification
    Chen, Rong
    Guanghui, Li
    Dai, Chenglong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [28] Histopathological image classification based on cross-domain deep transferred feature fusion
    Wang, Pin
    Li, Pufei
    Li, Yongming
    Wang, Jiaxin
    Xu, Jin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 68
  • [29] Feature Fusion via Deep Residual Graph Convolutional Network for Hyperspectral Image Classification
    Chen, Rong
    Guanghui, Li
    Dai, Chenglong
    IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [30] Image Classification with Superpixels and Feature Fusion Method
    Feng Yang
    Zheng Ma
    Mei Xie
    JournalofElectronicScienceandTechnology, 2021, 19 (01) : 70 - 78