Natural Language Processing Applied to Forensics Information Extraction With Transformers and Graph Visualization

被引：9

作者：

Barros Rodrigues, Fillipe ^{[1
]}

Ferreira Giozza, William ^{[1
]}

de Oliveira Albuquerque, Robson ^{[1
,2
]}

Garcia Villalba, Luis Javier ^{[2
]}

机构：

[1] Univ Brasilia, Professional Postgrad Program Elect Engn, Dept Elect Engn, BR-70910900 Brasilia, DF, Brazil

[2] Univ Complutense Madrid UCM, Fac Comp Sci & Engn, Dept Software Engn & Artificial Intelligence DISI, Grp Anal Secur & Syst GASS, Madrid 28040, Spain

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 04期

关键词：

Digital forensics; Task analysis; Natural language processing; Data mining; Information retrieval; Transformers; Training; named entity recognition (NER); natural language processing (NLP); relation extraction (RE); transformers; DIGITAL FORENSICS; SERVICE;

D O I：

10.1109/TCSS.2022.3159677

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Digital forensics analysis is a slow process mainly due to the large amount and variety of data. Some forensic tools help categorize files by type and allow automatization of tasks, like named entity recognition (NER). NER is a key component in many natural language processing (NLP) applications, such as relation extraction (RE) and information retrieval. The introduction of neural networks and transformer architectures in the last few years made it possible to develop more accurate models in different languages. This work proposes a reproducible setup to build a forensic pipeline for information extraction using NLP of texts. Our results show that it is possible to develop both NER and RE models in any language and tune its hyper-parameters to achieve state-of-art performance and build comprehensive knowledge graphs, decreasing the amount of time required for human supervision and review. We also find that solving this task in phases can further improve the performance, not only for digital investigation applications, but also for general-purpose information extraction and analysis.

引用

页码：4727 / 4743

页数：17

共 50 条

[1] Natural Language Processing: An Overview of Models, Transformers and Applied Practices
Canchila, Santiago
Meneses-Eraso, Carlos
Casanoves-Boix, Javier
Cortes-Pellicer, Pascual
Castello-Sirvent, Fernando
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (03) : 1097 - 1145
[2] On the Benefits of Information Retrieval and Information Extraction Techniques Applied to Digital Forensics
Lillis, David
Scanlon, Mark
ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 641 - 647
[3] Transformers: "The End of History" for Natural Language Processing?
Chernyavskiy, Anton
Ilvovsky, Dmitry
Nakov, Preslav
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 677 - 693
[4] On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain
von der Mosel, Julian
Trautsch, Alexander
Herbold, Steffen
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1487 - 1507
[5] Natural Language Processing Technology for Spatial Information Recognition and Visualization
Vicentiy, A. V.
ARTIFICIAL INTELLIGENCE TRENDS IN SYSTEMS, VOL 2, 2022, 502 : 512 - 520
[6] Natural language processing with transformers: a review
Tucudean, Georgiana
Bucos, Marian
Dragulescu, Bogdan
Caleanu, Catalin Daniel
PEERJ COMPUTER SCIENCE, 2024, 10
[7] Natural language processing with transformers: a review
Tucudean, Georgiana
Bucos, Marian
Dragulescu, Bogdan
Caleanu, Catalin Daniel
PeerJ Computer Science, 2024, 10
[8] A Word Sense Disambiguation Method Applied to Natural Language Processing for the Portuguese Language
do Nascimento, Clovis Holanda
Garcia, Vinicius Cardoso
Araujo, Ricardo de Andrade
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 268 - 277
[9] The application of natural language processing for the extraction of mechanistic information in toxicology
Corradi, Marie
Luechtefeld, Thomas
de Haan, Alyanne M.
Pieters, Raymond
Freedman, Jonathan H.
Vanhaecke, Tamara
Vinken, Mathieu
Teunis, Marc
FRONTIERS IN TOXICOLOGY, 2024, 6
[10] Biological gene extraction path based on knowledge graph and natural language processing
Zhang, Canlin
Cao, Xiaopei
FRONTIERS IN GENETICS, 2023, 13

← 1 2 3 4 5 →