Natural Language Processing Applied to Forensics Information Extraction With Transformers and Graph Visualization

被引:9
|
作者
Barros Rodrigues, Fillipe [1 ]
Ferreira Giozza, William [1 ]
de Oliveira Albuquerque, Robson [1 ,2 ]
Garcia Villalba, Luis Javier [2 ]
机构
[1] Univ Brasilia, Professional Postgrad Program Elect Engn, Dept Elect Engn, BR-70910900 Brasilia, DF, Brazil
[2] Univ Complutense Madrid UCM, Fac Comp Sci & Engn, Dept Software Engn & Artificial Intelligence DISI, Grp Anal Secur & Syst GASS, Madrid 28040, Spain
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 04期
关键词
Digital forensics; Task analysis; Natural language processing; Data mining; Information retrieval; Transformers; Training; named entity recognition (NER); natural language processing (NLP); relation extraction (RE); transformers; DIGITAL FORENSICS; SERVICE;
D O I
10.1109/TCSS.2022.3159677
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Digital forensics analysis is a slow process mainly due to the large amount and variety of data. Some forensic tools help categorize files by type and allow automatization of tasks, like named entity recognition (NER). NER is a key component in many natural language processing (NLP) applications, such as relation extraction (RE) and information retrieval. The introduction of neural networks and transformer architectures in the last few years made it possible to develop more accurate models in different languages. This work proposes a reproducible setup to build a forensic pipeline for information extraction using NLP of texts. Our results show that it is possible to develop both NER and RE models in any language and tune its hyper-parameters to achieve state-of-art performance and build comprehensive knowledge graphs, decreasing the amount of time required for human supervision and review. We also find that solving this task in phases can further improve the performance, not only for digital investigation applications, but also for general-purpose information extraction and analysis.
引用
收藏
页码:4727 / 4743
页数:17
相关论文
共 50 条
  • [21] Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers
    Skryseth, Daniel
    Shivashankar, Karthik
    Pilan, Ildiko
    Martini, Antonio
    2023 ACM/IEEE INTERNATIONAL CONFERENCE ON TECHNICAL DEBT, TECHDEBT, 2023, : 92 - 101
  • [22] Petro NLP: Resources for natural language processing and information extraction for the oil and gas industry
    Cordeiro, Fabio Correa
    da Silva, Patricia Ferreira
    Tessarollo, Alexandre
    Freitas, Claudia
    de Souza, Elvis
    Gomes, Diogo da Silva Magalhaes
    Souza, Renato Rocha
    Coelho, Flavio Codeco
    COMPUTERS & GEOSCIENCES, 2024, 193
  • [23] Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia-Ukraine War Using Transformers
    Hasan, Mahmud
    Islam, Labiba
    Jahan, Ismat
    Meem, Sabrina Mannan
    Rahman, Rashedur M.
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (03) : 329 - 356
  • [24] TEXT MESSAGE CORPUS: APPLYING NATURAL LANGUAGE PROCESSING TO MOBILE DEVICE FORENSICS
    O'Day, Daniel R.
    Calix, Ricardo A.
    ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [25] Natural Language Processing (NLP) Applied on Issue Trackers
    Ellmann, Mathias
    PROCEEDINGS OF THE 4TH ACM SIGSOFT INTERNATIONAL WORKSHOP ON NLP FOR SOFTWARE ENGINEERING (NL4SE '18), 2018, : 38 - 41
  • [26] Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm
    Gendrin, Aline
    Souliotis, Leonidas
    Loudon-Griffiths, James
    Aggarwal, Ravisha
    Amoako, Daniel
    Desouza, Gregory
    Dimitrievska, Sashka
    Metcalfe, Paul
    Louvet, Emilie
    Sahni, Harpreet
    JMIR FORMATIVE RESEARCH, 2023, 7
  • [27] Fiscal data in text: Information extraction from audit reports using Natural Language Processing
    Beltran, Alejandro
    DATA & POLICY, 2023, 5
  • [28] Data Extraction by Using Natural Language Processing Tool
    More, Sujata D.
    Madankar, Mangala S.
    Chandak, M. B.
    HELIX, 2018, 8 (05): : 3846 - 3848
  • [30] Extracting phenotypic information from the literature via natural language processing
    Chen, LF
    Friedman, C
    MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 758 - 762