Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques

被引:4
作者
Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
机构
[1] China University of Geosciences,School of Geography and Information Engineering
[2] National Engineering Research Center of Geographic Information System,undefined
来源
Earth Science Informatics | 2020年 / 13卷
关键词
Geoscience document; Knowledge graph; Geological text mining; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
A large number of georeferenced quantitative data about rock and geoscience surveys are buried in geological documents and remain unused. Data analytics and information extraction offer opportunities to use this data for improved understanding of ore forming processes and to enhance our knowledge. Extracting spatiotemporal and semantic information from a set of geological documents enables us to develop a rich representation of the geoscience knowledge recorded in unstructured text written in Chinese. This paper presents the workflow for spatiotemporal and semantic information extraction, which is a geological document analysis approach that uses automated techniques for browsing and searching relevant geological content. The developed workflow applies spatial and temporal gazetteer matching, pattern-based rules and spatiotemporal relationship extraction to identify and label terms in geological text documents. It offers a representation of contextual information in knowledge graph form, extracts a set of relevant tables and figures, and queries a list of relevant documents by using geological topic information. Here, text mining techniques are used to facilitate the analysis of geological knowledge and to show the effectiveness of text analysis for improving the rapid assessment of a massive number of documents. Furthermore, autogenerated keyword suggestions derived from extracted keyword associations are used to reduce document search efforts. This research illustrates the usefulness and effectiveness of the developed information extraction workflow and demonstrates the potential of incorporating text mining and NLP techniques for geoscience.
引用
收藏
页码:1393 / 1410
页数:17
相关论文
共 44 条
  • [41] Adverse Event extraction from Structured Product Labels using the Event-based Text-mining of Health Electronic Records (ETHER) system
    Pandey, Abhishek
    Kreimeyer, Kory
    Foster, Matthew
    Oanh Dang
    Ly, Thomas
    Wang, Wei
    Forshee, Richard
    Botsis, Taxiarchis
    HEALTH INFORMATICS JOURNAL, 2019, 25 (04) : 1232 - 1243
  • [42] Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
    Jia Li
    Yucong Lin
    Pengfei Zhao
    Wenjuan Liu
    Linkun Cai
    Jing Sun
    Lei Zhao
    Zhenghan Yang
    Hong Song
    Han Lv
    Zhenchang Wang
    BMC Medical Informatics and Decision Making, 22
  • [43] Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
    Li, Jia
    Lin, Yucong
    Zhao, Pengfei
    Liu, Wenjuan
    Cai, Linkun
    Sun, Jing
    Zhao, Lei
    Yang, Zhenghan
    Song, Hong
    Lv, Han
    Wang, Zhenchang
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [44] Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data
    Sezgin, Emre
    Hussain, Syed-Amad
    Rust, Steve
    Huang, Yungui
    JMIR FORMATIVE RESEARCH, 2023, 7