Visual and Textual Deep Feature Fusion for Document Image Classification

被引：29

作者：

Bakkali, Souhail ^{[1
]}

Ming, Zuheng ^{[1
]}

Coustaty, Mickael ^{[1
]}

Rusinol, Marcal ^{[2
]}

机构：

[1] Univ La Rochelle, L3i, La Rochelle, France

[2] Univ Autonoma Barcelona, CVC, Barcelona, Spain

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年

关键词：

D O I：

10.1109/CVPRW50498.2020.00289

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The topic of text document image classification has been explored extensively over the past few years. Most recent approaches handled this task by jointly learning the visual features of document images and their corresponding textual contents. Due to the various structures of document images, the extraction of semantic information from its textual content is beneficial for document image processing tasks such as document retrieval, information extraction, and text classification. In this work, a two-stream neural architecture is proposed to perform the document image classification task. We conduct an exhaustive investigation of nowadays widely used neural networks as well as word embedding procedures used as backbones, in order to extract both visual and textual features from document images. Moreover, a joint feature learning approach that combines image features and text embeddings is introduced as a late fusion methodology. Both the theoretical analysis and the experimental results demonstrate the superiority of our proposed joint feature learning method comparatively to the single modalities. This joint learning approach outperforms the state-of-the-art results with a classification accuracy of 97.05% on the large-scale RVL-CDIP dataset.

引用

页码：2394 / 2403

页数：10

共 50 条

[21] Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification
Salman Al-Tameemi I.K.
Feizi-Derakhshi M.-R.
Pashazadeh S.
Asadpour M.
Computers, Materials and Continua, 2023, 76 (02): : 2145 - 2177
[22] A Textual Backdoor Defense Method Based on Deep Feature Classification
Shao, Kun
Yang, Junan
Hu, Pengjiang
Li, Xiaoshuai
ENTROPY, 2023, 25 (02)
[23] ROBUST VISUAL TRACKING WITH DEEP FEATURE FUSION
Wang, Guokun
Wang, Jingjing
Tang, Wenyi
Yu, Nenghai
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1917 - 1921
[24] A Shallow-to-Deep Feature Fusion Network for VHR Remote Sensing Image Classification
Liu, Sicong
Zheng, Yongjie
Du, Qian
Bruzzone, Lorenzo
Samat, Alim
Tong, Xiaohua
Jin, Yanmin
Wang, Chao
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[25] A Feature Fusion Network for PolSAR Image Classification Based on Physical Features and Deep Features
Hua, Wenqiang
Hou, Qianjin
Jin, Xiaomin
Liu, Lin
Sun, Nan
Meng, Zhe
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[26] An efficient feature fusion in HSI image classification
Srivastava, Vishal
Biswas, Bhaskar
MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2020, 31 (01) : 221 - 247
[27] Feature Fusion via Deep Residual Graph Convolutional Network for Hyperspectral Image Classification
Chen, Rong
Guanghui, Li
Dai, Chenglong
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[28] Histopathological image classification based on cross-domain deep transferred feature fusion
Wang, Pin
Li, Pufei
Li, Yongming
Wang, Jiaxin
Xu, Jin
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 68
[29] Feature Fusion via Deep Residual Graph Convolutional Network for Hyperspectral Image Classification
Chen, Rong
Guanghui, Li
Dai, Chenglong
IEEE Geoscience and Remote Sensing Letters, 2022, 19
[30] Image Classification with Superpixels and Feature Fusion Method
Feng Yang
Zheng Ma
Mei Xie
JournalofElectronicScienceandTechnology, 2021, 19 (01) : 70 - 78

← 1 2 3 4 5 →