Sequence-aware multimodal page classification of Brazilian legal documents

被引：0

作者：

Pedro H. Luz de Araujo

Ana Paula G. S. de Almeida

Fabricio Ataides Braz

Nilton Correia da Silva

Flavio de Barros Vidal

Teofilo E. de Campos

机构：

[1] Universidade de Brasília,Department of Computer Science

[2] Universidade de Brasília,Department of Mechanical Engineering

[3] University of Brasilia,Gama Faculty

来源：

International Journal on Document Analysis and Recognition (IJDAR) | 2023年 / 26卷

关键词：

Multimodal page classification; Document classification; Legal domain; Sequence classification; Portuguese language processing;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases—which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil’s Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: A ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed fusion module. Our fusion module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bidirectional long short-term memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.

引用

页码：33 / 49

页数：16

共 38 条

[1]

Bojanowski P(2017)Enriching word vectors with subword information Trans. Assoc. Comput. Linguistics 5 135-146

[2]

Grave E(2021)Leveraging effectiveness and efficiency in page stream deep segmentation Eng. Appl. Artif. Intell. 105 1-16

[3]

Joulin A(2007)A survey of document image classification: problem statement, classifier architecture and performance evaluation Int. J. Document Anal. Recogn. (IJDAR) 10 391-407

[4]

Mikolov T(1990)Indexing by latent semantic analysis J. Am. Soc. Inform. Sci. 41 1735-1780

[5]

Braz FA(1997)Long short-term memory Neural Comput. 9 119-126

[6]

da Silva NC(2014)Structural similarity for document image classification and retrieval Pattern Recogn. Lett. 43 331-341

[7]

Lima JAS(2014)Multimodal page classification in administrative document image streams Int. J. Document Anal. Recogn. (IJDAR) 17 211-252

[8]

Chen N(2015)ImageNet Large Scale Visual Recognition Challenge Int. J. Comput. Visi. (IJCV) 115 undefined-undefined

[9]

Blostein D(2019)Multi-modal page stream segmentation with convolutional neural networks Language Res. Evalu. undefined undefined-undefined

[10]

Deerwester S(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 →