Text visualization for geological hazard documents via text mining and natural language processing

被引:0
作者
Ying Ma
Zhong Xie
Gang Li
Kai Ma
Zhen Huang
Qinjun Qiu
Hui Liu
机构
[1] National Engineering Research Center of Geographic Information System,School of Geography and Information Engineering
[2] China University of Geosciences,College of Computer and Information Technology
[3] Jinan Rail Transit Group Co.,undefined
[4] Ltd,undefined
[5] China Three Gorges University,undefined
[6] Wuhan Zondy Cyber Science & Technology Co.,undefined
[7] Ltd,undefined
来源
Earth Science Informatics | 2022年 / 15卷
关键词
Geological disaster report; Text mining; Natural language processing; Text visualization analysis;
D O I
暂无
中图分类号
学科分类号
摘要
An increasing number of geological hazard documents about the mechanism and occurrence process of geological disasters contain unstructured geoscientific data that are not fully utilized. Text mining and visualization techniques offer opportunities to leverage this wealth of data and extract valuable information from dense, abstract geological disaster reports to quickly focus on the core information in geological reports and improve the efficiency of report usage. In this research, a flow framework for the automatic extraction of key information and its transformation to a simple and intuitive form for managers/researchers to quickly navigate, understand and make more informed decisions based on the key information are described. To automatically extract key information from text, an optimized term frequency-inverse document frequency algorithm is proposed to analyze text characteristics. The important information extracted from a case study document is demonstrated using a word cloud. Co-occurrence network analysis is used to present key content from geological reports and describe the correlations between words. We use the dependency grammar technique to extract triads of geological report text information and we visualize them using knowledge graphs. The results show that text visualization analysis can be used to identify the types and locations of geological disasters in reports, highlight key information from survey reports as an auxiliary public resource, and more rapidly analyze the key contents of a large number of geological disaster survey reports.
引用
收藏
页码:439 / 454
页数:15
相关论文
共 138 条
[1]  
Chen G(2016)Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods J Infor 10 212-22
[2]  
Xiao L(2016)Evaluating the strength of text classification categories for supporting construction field inspection Autom Constr 64 78-88
[3]  
Chi N(2017)Improved TFIDF in big news retrieval: An empirical study Pattern Recognit Lett 93 113-122
[4]  
Lin K(2014)Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information Comp Geosci 63 22-33
[5]  
El-Gohary N(2015)Text to multi-level MindMaps: A novel method for hierarchical visual abstractionof natural language text Multim Tools Appl 9 15-76
[6]  
Hsieh S(2020)Deep learning-based named entity recognition and knowledge graph construction for geological hazards ISPRS Int J Geo-Inf 89 72-576
[7]  
Chen C(2016)Visual analytics for text-based railway incident reports Saf Sci 111 102919-243
[8]  
Cracknell MJ(2019)GeoDocA–Fast analysis of geological content in mineral exploration reports: A text mining approach Ore Geol Rev 19 561-52297
[9]  
Reading AM(2016)A text visualization method for cross-domain research topic mining J Vis 9 408-160
[10]  
Elhoseiny M(2019)Surface-rupturing historical earthquakes in Australia and their environmental effects: new insights from re-analyses of observational data Geosciences 92 107096-390