Natural Language Processing Applications in Case-Law Text Publishing

被引:2
作者
Tarasconi, Francesco [1 ]
Botros, Milad [1 ]
Caserio, Matteo [1 ]
Sportelli, Gianpiero [1 ]
Giacalone, Giuseppe [2 ]
Uttini, Carlotta [2 ]
Vignati, Luca [2 ]
Zanetta, Fabrizio [2 ]
机构
[1] CELI Language Technol, Via San Quintino 31, I-10121 Turin, Italy
[2] Giuffre Francis Lefebvre, Milan, Italy
来源
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS | 2020年 / 334卷
关键词
natural language processing; applications; transfer learning; language models; text classification; information extraction; publishing industry; machine learning; BERT fine-tuning; random forest; Italian language;
D O I
10.3233/FAIA200859
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Processing case-law contents for electronic publishing purposes is a time-consuming activity that encompasses several sub-tasks and usually involves adding annotations to the original text. On the other hand, recent trends in Artificial Intelligence and Natural Language Processing enable the automatic and efficient analysis of big textual data. In this paper we present our Machine Learning solution to three specific business problems, regularly met by a real world Italian publisher in their day-to-day work: recognition of legal references in text spans, new content ranking by relevance, and text classification according to a given tree of topics. Different approaches based on BERT language model were experimented with, together with alternatives, typically based on Bag-of-Words. The optimal solution, deployed in a controlled production environment, was in two out of three cases based on fine-tuned BERT (for the extraction of legal references and text classification), while, in the case of relevance ranking, a Random Forest model, with hand-crafted features, was preferred. We will conclude by discussing the concrete impact, as perceived by the publisher, of the developed prototypes.
引用
收藏
页码:154 / 163
页数:10
相关论文
共 50 条
[41]   The Use of Text Retrieval and Natural Language Processing in Software Engineering [J].
Haiduc, Sonia ;
Arnaoudova, Venera ;
Marcus, Andrian ;
Antoniol, Giuliano .
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, :898-899
[42]   A Toolkit for Text Extraction and Analysis for Natural Language Processing Tasks [J].
Sefara, Tshephisho Joseph ;
Mbooi, Mahlatse ;
Mashile, Katlego ;
Rambuda, Thompho ;
Rangata, Mapitsi .
5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD2022), 2022,
[43]   The Use of Text Retrieval and Natural Language Processing in Software Engineering [J].
Arnaoudova, Venera ;
Haiduc, Sonia ;
Marcus, Andrian ;
Antoniol, Giuliano .
2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol 2, 2015, :949-950
[44]   NLPReViz: an interactive tool for natural language processing on clinical text [J].
Trivedi, Gaurav ;
Phuong Pham ;
Chapman, Wendy W. ;
Hwa, Rebecca ;
Wiebe, Janyce ;
Hochheiser, Harry .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (01) :81-87
[45]   Applications of Natural Language Processing for the Management of Stroke Disorders: Scoping Review [J].
De Rosario, Helios ;
Pitarch-Corresa, Salvador ;
Pedrosa, Ignacio ;
Vidal-Pedros, Marina ;
de Otto-Lopez, Beatriz ;
Garcia-Mieres, Helena ;
Alvarez-Rodriguez, Lydia .
JMIR MEDICAL INFORMATICS, 2023, 11
[46]   Automatic Extraction of Engineering Rules From Unstructured Text: A Natural Language Processing Approach [J].
Ye, Xinfeng ;
Lu, Yuqian .
JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2020, 20 (03)
[47]   Transforming epilepsy research: A systematic review on natural language processing applications [J].
Yew, Arister N. J. ;
Schraagen, Marijn ;
Otte, Willem M. M. ;
van Diessen, Eric .
EPILEPSIA, 2023, 64 (02) :292-305
[48]   Applications of Natural Language Processing Tools in Orthopaedic Surgery: A Scoping Review [J].
Sasanelli, Francesca ;
Le, Khang Duy Ricky ;
Tay, Samuel Boon Ping ;
Tran, Phong ;
Verjans, Johan W. .
APPLIED SCIENCES-BASEL, 2023, 13 (20)
[49]   Construction site accident analysis using text mining and natural language processing techniques [J].
Zhang, Fan ;
Fleyeh, Hasan ;
Wang, Xinru ;
Lu, Minghui .
AUTOMATION IN CONSTRUCTION, 2019, 99 :238-248
[50]   Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification [J].
Renu Balyan ;
Kathryn S. McCarthy ;
Danielle S. McNamara .
International Journal of Artificial Intelligence in Education, 2020, 30 :337-370