Natural Language Processing Applied to Forensics Information Extraction With Transformers and Graph Visualization

被引:9
|
作者
Barros Rodrigues, Fillipe [1 ]
Ferreira Giozza, William [1 ]
de Oliveira Albuquerque, Robson [1 ,2 ]
Garcia Villalba, Luis Javier [2 ]
机构
[1] Univ Brasilia, Professional Postgrad Program Elect Engn, Dept Elect Engn, BR-70910900 Brasilia, DF, Brazil
[2] Univ Complutense Madrid UCM, Fac Comp Sci & Engn, Dept Software Engn & Artificial Intelligence DISI, Grp Anal Secur & Syst GASS, Madrid 28040, Spain
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 04期
关键词
Digital forensics; Task analysis; Natural language processing; Data mining; Information retrieval; Transformers; Training; named entity recognition (NER); natural language processing (NLP); relation extraction (RE); transformers; DIGITAL FORENSICS; SERVICE;
D O I
10.1109/TCSS.2022.3159677
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Digital forensics analysis is a slow process mainly due to the large amount and variety of data. Some forensic tools help categorize files by type and allow automatization of tasks, like named entity recognition (NER). NER is a key component in many natural language processing (NLP) applications, such as relation extraction (RE) and information retrieval. The introduction of neural networks and transformer architectures in the last few years made it possible to develop more accurate models in different languages. This work proposes a reproducible setup to build a forensic pipeline for information extraction using NLP of texts. Our results show that it is possible to develop both NER and RE models in any language and tune its hyper-parameters to achieve state-of-art performance and build comprehensive knowledge graphs, decreasing the amount of time required for human supervision and review. We also find that solving this task in phases can further improve the performance, not only for digital investigation applications, but also for general-purpose information extraction and analysis.
引用
收藏
页码:4727 / 4743
页数:17
相关论文
共 50 条
  • [31] Learning to Rank for Information Retrieval and Natural Language Processing, Second Edition
    Huawei Technologies, China
    Synth. Lect. Human Lang. Technol., 3 (1-123): : 1 - 123
  • [32] Biomolecular Event Extraction using Natural Language Processing
    Bali, Manish
    Anandaraj, S. P.
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (05) : 601 - 612
  • [33] Development History and Frontier Trends of Natural Language Processing Technology in Education: A Knowledge Graph-Based Visualization Analysis
    Xue, Hanbing
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY AND COMPUTERS, ICETC 2023, 2023, : 436 - 443
  • [34] From rule-based models to deep learning transformers architectures for natural language processing and sign language translation systems: survey, taxonomy and performance evaluation
    Shahin, Nada
    Ismail, Leila
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (10)
  • [35] Information Extraction from Natural Language Using Universal Networking Language
    Saha, Aloke Kumar
    Mridha, M. F.
    Rafiq, Jahir Ibna
    Das, Jugal K.
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, IC4S 2018, 2019, 924 : 283 - 292
  • [36] Ensemble Techniques for Robust Fake News Detection: Integrating Transformers, Natural Language Processing, and Machine Learning
    Al-alshaqi, Mohammed
    Rawat, Danda B.
    Liu, Chunmei
    SENSORS, 2024, 24 (18)
  • [37] A systematic review of natural language processing applied to radiology reports
    Casey, Arlene
    Davidson, Emma
    Poon, Michael
    Dong, Hang
    Duma, Daniel
    Grivas, Andreas
    Grover, Claire
    Suarez-Paniagua, Victor
    Tobin, Richard
    Whiteley, William
    Wu, Honghan
    Alex, Beatrice
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
  • [38] A systematic review of natural language processing applied to radiology reports
    Arlene Casey
    Emma Davidson
    Michael Poon
    Hang Dong
    Daniel Duma
    Andreas Grivas
    Claire Grover
    Víctor Suárez-Paniagua
    Richard Tobin
    William Whiteley
    Honghan Wu
    Beatrice Alex
    BMC Medical Informatics and Decision Making, 21
  • [39] Syntactic and semantic information extraction from NPP procedures utilizing natural language processing integrated with rules
    Choi, Yongsun
    Minh Duc Nguyen
    Kerr, Thomas N., Jr.
    NUCLEAR ENGINEERING AND TECHNOLOGY, 2021, 53 (03) : 866 - 878
  • [40] Natural Language Processing Pipeline for Temporal Information Extraction and Classification from Free Text Eligibility Criteria
    Parthasarathy, Gayathri
    Olmsted, Aspen
    Anderson, Paul
    INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 120 - 121