Systematic Literature Review of Information Extraction From Textual Data: Recent Methods, Applications, Trends, and Challenges

被引:18
作者
Abdullah, Mohd Hafizul Afifi [1 ]
Aziz, Norshakirah [1 ]
Abdulkadir, Said Jadid [1 ]
Alhussian, Hitham Seddig Alhassan [1 ]
Talpur, Noureen [1 ]
机构
[1] Univ Teknol PETRONAS, Ctr Res Data Sci CeRDaS, Comp Informat Sci Dept, Seri Iskandar 32610, Malaysia
关键词
Data mining; Hidden Markov models; Analytical models; Systematics; Market research; Task analysis; Feature extraction; Information extraction; text extraction; named entity; named entity recognition; relation extraction; event extraction; deep learning; NAMED-ENTITY RECOGNITION; CHINESE RELATION EXTRACTION; NEURAL-NETWORKS; MODEL; CONSTRUCTION; FRAMEWORK; DOCUMENTS; SUPPORT;
D O I
10.1109/ACCESS.2023.3240898
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information extraction (IE) is a challenging task, particularly when dealing with highly heterogeneous data. State-of-the-art data mining technologies struggle to process information from textual data. Therefore, various IE techniques have been developed to enable the use of IE for textual data. However, each technique differs from one another because it is designed for different data types and has different target information to be extracted. This study investigated and described the most contemporary methods for extracting information from textual data, emphasizing their benefits and shortcomings. To provide a holistic view of the domain, this comprehensive systematic literature review employed a systematic mapping process to summarize studies published in the last six years (from 2017 to 2022). It covers fundamental concepts, recent approaches, applications, and trends, in addition to challenges and future research prospects in this domain area. Based on an analysis of 161 selected studies, we found that the state-of-the-art models employ deep learning to extract information from textual data. Finally, this study aimed to guide novice and experienced researchers in future research and serve as a foundation for this research area.
引用
收藏
页码:10535 / 10562
页数:28
相关论文
共 184 条
[11]  
Akmal M., 2020, 2020 INT C DATA SCI, P1, DOI DOI 10.1109/ICODSA50139.2020.9212879
[12]   Named Entity Extraction for Knowledge Graphs: A Literature Overview [J].
Al-Moslmi, Tareq ;
Ocana, Marc Gallofre ;
Opdahl, Andreas L. ;
Veres, Csaba .
IEEE ACCESS, 2020, 8 :32862-32881
[13]  
Albared M., 2019, PROC 1 INT C INTELL, P1
[14]   Extraction of temporal relations from clinical free text: A systematic review of current approaches [J].
Alfattni, Ghada ;
Peek, Niels ;
Nenadic, Goran .
JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 108 (108)
[15]   Classical Arabic Named Entity Recognition Using Variant Deep Neural Network Architectures and BERT [J].
Alsaaran, Norah ;
Alrabiah, Maha .
IEEE ACCESS, 2021, 9 :91537-91547
[16]  
Alshuwaier F., 2017, PROC 4 IEEE INT C EN, P1
[17]   Information Extraction Applications for Clinical Trials: A Survey [J].
Alves, Sofia ;
Costa, Joao ;
Bernardino, Jorge .
2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2019,
[18]  
Ameta D., 2018, PROC INT C COMMUN IN, P1
[19]  
Anandika Amrita, 2019, 2019 International Conference on Applied Machine Learning (ICAML). Proceedings, P153, DOI 10.1109/ICAML48257.2019.00037
[20]  
[Anonymous], 2012, PROC DEMONSTRATIONS