Identification of Paragraph Regularities in Legal Judgements Through Clustering and Textual Embedding

被引:0
作者
De Martino, Graziella [1 ]
Pio, Gianvito [1 ,2 ]
机构
[1] Univ Bari Aldo Moro, Dept Comp Sci, Via Orabona 4, I-70125 Bari, Italy
[2] Natl Interuniv Consortium Informat, Big Data Lab, Via Ariosto 25, I-00185 Rome, Italy
来源
FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2022) | 2022年 / 13515卷
关键词
Legal information retrieval; Embedding; Clustering; Approximate nearest neighbor search;
D O I
10.1007/978-3-031-16564-1_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In an era characterized by fast technological progresses, working in the law field is very difficult if not supported by the right tools. In this paper, we present a novel method, called JPReg, that identifies paragraph regularities in legal case judgments to support legal experts during the preparation of new legal documents (i.e., paragraphs of existing documents that are similar to those of a document under preparation). JPReg adopts a two-step approach that first clusters similar documents, according to their semantic content, and then identifies regularities in the paragraphs for each cluster. Text embedding methods are adopted to represent documents and paragraphs into a numerical feature space, and an Approximated Nearest Neighbor Search method is adopted to efficiently retrieve the most similar paragraphs with respect to those of a target document. Our extensive experimental evaluation, performed on a real-world dataset, shows the effectiveness and the computational efficiency of the proposed method even in presence of noise in the data.
引用
收藏
页码:74 / 84
页数:11
相关论文
共 21 条
[1]  
Bernhardsson E., 2015, Annoy at github
[2]  
Chalkidis I, 2020, M ASS FOR COMPUTATIO
[3]   DENCAST: distributed density-based clustering for multi-target regression [J].
Corizzo, Roberto ;
Pio, Gianvito ;
Ceci, Michelangelo ;
Malerba, Donato .
JOURNAL OF BIG DATA, 2019, 6 (01)
[4]   PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments [J].
De Martino, Graziella ;
Pio, Gianvito ;
Ceci, Michelangelo .
ARTIFICIAL INTELLIGENCE AND LAW, 2022, 30 (03) :359-390
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]   BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent [J].
Hess, Sibylle ;
Pio, Gianvito ;
Hochstenbach, Michiel ;
Ceci, Michelangelo .
DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (06) :2542-2576
[7]   Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec [J].
Kim, Donghwa ;
Seo, Deokseong ;
Cho, Suhyoun ;
Kang, Pilsung .
INFORMATION SCIENCES, 2019, 477 :15-29
[8]  
Kriegel H.-P., 1996, P KDD, P226, DOI DOI 10.5555/3001460.3001507
[9]  
Kumar A, 2020, P 6 WORKSH NOIS US G, P16, DOI DOI 10.18653/V1/2020.WNUT-1.3
[10]  
Kumar Sushanta, 2013, Databases in Networked Information Systems. 8th International Workshop, DNIS 2013. Proceedings, P103, DOI 10.1007/978-3-642-37134-9_9