Natural Language Processing Applied to Forensics Information Extraction With Transformers and Graph Visualization

被引:9
|
作者
Barros Rodrigues, Fillipe [1 ]
Ferreira Giozza, William [1 ]
de Oliveira Albuquerque, Robson [1 ,2 ]
Garcia Villalba, Luis Javier [2 ]
机构
[1] Univ Brasilia, Professional Postgrad Program Elect Engn, Dept Elect Engn, BR-70910900 Brasilia, DF, Brazil
[2] Univ Complutense Madrid UCM, Fac Comp Sci & Engn, Dept Software Engn & Artificial Intelligence DISI, Grp Anal Secur & Syst GASS, Madrid 28040, Spain
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 04期
关键词
Digital forensics; Task analysis; Natural language processing; Data mining; Information retrieval; Transformers; Training; named entity recognition (NER); natural language processing (NLP); relation extraction (RE); transformers; DIGITAL FORENSICS; SERVICE;
D O I
10.1109/TCSS.2022.3159677
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Digital forensics analysis is a slow process mainly due to the large amount and variety of data. Some forensic tools help categorize files by type and allow automatization of tasks, like named entity recognition (NER). NER is a key component in many natural language processing (NLP) applications, such as relation extraction (RE) and information retrieval. The introduction of neural networks and transformer architectures in the last few years made it possible to develop more accurate models in different languages. This work proposes a reproducible setup to build a forensic pipeline for information extraction using NLP of texts. Our results show that it is possible to develop both NER and RE models in any language and tune its hyper-parameters to achieve state-of-art performance and build comprehensive knowledge graphs, decreasing the amount of time required for human supervision and review. We also find that solving this task in phases can further improve the performance, not only for digital investigation applications, but also for general-purpose information extraction and analysis.
引用
收藏
页码:4727 / 4743
页数:17
相关论文
共 50 条
  • [1] Natural Language Processing: An Overview of Models, Transformers and Applied Practices
    Canchila, Santiago
    Meneses-Eraso, Carlos
    Casanoves-Boix, Javier
    Cortes-Pellicer, Pascual
    Castello-Sirvent, Fernando
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (03) : 1097 - 1145
  • [2] On the Benefits of Information Retrieval and Information Extraction Techniques Applied to Digital Forensics
    Lillis, David
    Scanlon, Mark
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 641 - 647
  • [3] Transformers: "The End of History" for Natural Language Processing?
    Chernyavskiy, Anton
    Ilvovsky, Dmitry
    Nakov, Preslav
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 677 - 693
  • [4] On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain
    von der Mosel, Julian
    Trautsch, Alexander
    Herbold, Steffen
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1487 - 1507
  • [5] Natural Language Processing Technology for Spatial Information Recognition and Visualization
    Vicentiy, A. V.
    ARTIFICIAL INTELLIGENCE TRENDS IN SYSTEMS, VOL 2, 2022, 502 : 512 - 520
  • [6] Natural language processing with transformers: a review
    Tucudean, Georgiana
    Bucos, Marian
    Dragulescu, Bogdan
    Caleanu, Catalin Daniel
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [7] Natural language processing with transformers: a review
    Tucudean, Georgiana
    Bucos, Marian
    Dragulescu, Bogdan
    Caleanu, Catalin Daniel
    PeerJ Computer Science, 2024, 10
  • [8] A Word Sense Disambiguation Method Applied to Natural Language Processing for the Portuguese Language
    do Nascimento, Clovis Holanda
    Garcia, Vinicius Cardoso
    Araujo, Ricardo de Andrade
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 268 - 277
  • [9] The application of natural language processing for the extraction of mechanistic information in toxicology
    Corradi, Marie
    Luechtefeld, Thomas
    de Haan, Alyanne M.
    Pieters, Raymond
    Freedman, Jonathan H.
    Vanhaecke, Tamara
    Vinken, Mathieu
    Teunis, Marc
    FRONTIERS IN TOXICOLOGY, 2024, 6
  • [10] Biological gene extraction path based on knowledge graph and natural language processing
    Zhang, Canlin
    Cao, Xiaopei
    FRONTIERS IN GENETICS, 2023, 13