Systematic Literature Review of Information Extraction From Textual Data: Recent Methods, Applications, Trends, and Challenges

被引:17
作者
Abdullah, Mohd Hafizul Afifi [1 ]
Aziz, Norshakirah [1 ]
Abdulkadir, Said Jadid [1 ]
Alhussian, Hitham Seddig Alhassan [1 ]
Talpur, Noureen [1 ]
机构
[1] Univ Teknol PETRONAS, Ctr Res Data Sci CeRDaS, Comp Informat Sci Dept, Seri Iskandar 32610, Malaysia
关键词
Data mining; Hidden Markov models; Analytical models; Systematics; Market research; Task analysis; Feature extraction; Information extraction; text extraction; named entity; named entity recognition; relation extraction; event extraction; deep learning; NAMED-ENTITY RECOGNITION; CHINESE RELATION EXTRACTION; NEURAL-NETWORKS; MODEL; CONSTRUCTION; FRAMEWORK; DOCUMENTS; SUPPORT;
D O I
10.1109/ACCESS.2023.3240898
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information extraction (IE) is a challenging task, particularly when dealing with highly heterogeneous data. State-of-the-art data mining technologies struggle to process information from textual data. Therefore, various IE techniques have been developed to enable the use of IE for textual data. However, each technique differs from one another because it is designed for different data types and has different target information to be extracted. This study investigated and described the most contemporary methods for extracting information from textual data, emphasizing their benefits and shortcomings. To provide a holistic view of the domain, this comprehensive systematic literature review employed a systematic mapping process to summarize studies published in the last six years (from 2017 to 2022). It covers fundamental concepts, recent approaches, applications, and trends, in addition to challenges and future research prospects in this domain area. Based on an analysis of 161 selected studies, we found that the state-of-the-art models employ deep learning to extract information from textual data. Finally, this study aimed to guide novice and experienced researchers in future research and serve as a foundation for this research area.
引用
收藏
页码:10535 / 10562
页数:28
相关论文
共 184 条
[1]  
Abdullah M. H. A., 2022, PROC 2 INT C EMERG T, P118
[2]  
Abdullah MF, 2015, 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, P473, DOI 10.1109/ICEEI.2015.7352547
[3]  
Abdullah MF, 2013, INT CONF RES INNOV, P151, DOI 10.1109/ICRIIS.2013.6716700
[4]  
Abdurehim Peride, 2020, 2020 13th International Conference on Intelligent Computation Technology and Automation (ICICTA), P18, DOI 10.1109/ICICTA51737.2020.00013
[5]  
Adnan K, 2019, Int J Rec Tech Eng, V8, P1398
[6]   An analytical study of information extraction from unstructured and multidimensional big data [J].
Adnan, Kiran ;
Akbar, Rehan .
JOURNAL OF BIG DATA, 2019, 6 (01)
[7]   Limitations of information extraction methods and techniques for heterogeneous unstructured big data [J].
Adnan, Kiran ;
Akbar, Rehan .
INTERNATIONAL JOURNAL OF ENGINEERING BUSINESS MANAGEMENT, 2019, 11
[8]   Automatic problem extraction and analysis from unstructured text in IT tickets [J].
Agarwal, S. ;
Aggarwal, V. ;
Akula, A. R. ;
Dasgupta, G. B. ;
Sridhara, G. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (01) :41-52
[9]   Learning to Filter Documents for Information Extraction using Rapid Annotation [J].
Aguirre, Carlos A. ;
Gullapalli, Sneha ;
De La Torre, Maria F. ;
Lam, Alice ;
Weese, Joshua Levi ;
Hsu, William H. .
2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, :85-90
[10]   Causal relationship extraction from biomedical text using deep neural models: A comprehensive survey [J].
Akkasi, Abbas ;
Moens, Mari-Francine .
JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 119