Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques

被引:4
|
作者
Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
机构
[1] China University of Geosciences,School of Geography and Information Engineering
[2] National Engineering Research Center of Geographic Information System,undefined
来源
Earth Science Informatics | 2020年 / 13卷
关键词
Geoscience document; Knowledge graph; Geological text mining; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
A large number of georeferenced quantitative data about rock and geoscience surveys are buried in geological documents and remain unused. Data analytics and information extraction offer opportunities to use this data for improved understanding of ore forming processes and to enhance our knowledge. Extracting spatiotemporal and semantic information from a set of geological documents enables us to develop a rich representation of the geoscience knowledge recorded in unstructured text written in Chinese. This paper presents the workflow for spatiotemporal and semantic information extraction, which is a geological document analysis approach that uses automated techniques for browsing and searching relevant geological content. The developed workflow applies spatial and temporal gazetteer matching, pattern-based rules and spatiotemporal relationship extraction to identify and label terms in geological text documents. It offers a representation of contextual information in knowledge graph form, extracts a set of relevant tables and figures, and queries a list of relevant documents by using geological topic information. Here, text mining techniques are used to facilitate the analysis of geological knowledge and to show the effectiveness of text analysis for improving the rapid assessment of a massive number of documents. Furthermore, autogenerated keyword suggestions derived from extracted keyword associations are used to reduce document search efforts. This research illustrates the usefulness and effectiveness of the developed information extraction workflow and demonstrates the potential of incorporating text mining and NLP techniques for geoscience.
引用
收藏
页码:1393 / 1410
页数:17
相关论文
共 44 条
  • [31] Automatic event identification and extraction from daily drilling reports using an expert system and artificial intelligence
    Cinelli, Lucas P.
    de Oliveira, Jose F. L.
    de Pinho, Vinicius M.
    Passos, Wesley L.
    Padilla, Rafael
    Braz, Patrick F.
    Galves, Breno
    Dalvi, Domenica P.
    Lewenfus, Gabriela
    Ferreira, Jonathas O.
    Ji, Anthony Y. Y.
    de Oliveira, Felipe L.
    Goncalves, Clemente J. C.
    Netto, Sergio L.
    da Silva, Eduardo A. B.
    de Campos, Marcello L. R.
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2021, 205
  • [32] Large language model-based information extraction from free-text radiology reports: a scoping review protocol
    Reichenpfader, Daniel
    Muller, Henning
    Denecke, Kerstin
    BMJ OPEN, 2023, 13 (12):
  • [33] Features level sentiment mining in enterprise systems from informal text corpus using machine learning techniques
    Panigrahi, Ritanjali
    Bele, Nishikant
    Panigrahi, Prabin Kumar
    Gupta, Brij B.
    ENTERPRISE INFORMATION SYSTEMS, 2024, 18 (05)
  • [34] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Hyung Jun Park
    Namu Park
    Jang Ho Lee
    Myeong Geun Choi
    Jin-Sook Ryu
    Min Song
    Chang-Min Choi
    BMC Medical Informatics and Decision Making, 22
  • [35] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Park, Hyung Jun
    Park, Namu
    Lee, Jang Ho
    Choi, Myeong Geun
    Ryu, Jin-Sook
    Song, Min
    Choi, Chang-Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [36] Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4)
    Truhn, Daniel
    Loeffler, Chiara M. L.
    Mueller-Franzes, Gustav
    Nebelung, Sven
    Hewitt, Katherine J.
    Brandner, Sebastian
    Bressem, Keno K.
    Foersch, Sebastian
    Kather, Jakob Nikolas
    JOURNAL OF PATHOLOGY, 2024, 262 (03): : 310 - 319
  • [37] Automatic Extraction of Risk Factors for Dialysis Patients from Clinical Notes Using Natural Language Processing Techniques
    Michalopoulos, George
    Qazi, Hammad
    Wong, Alexander
    Butt, Zahid
    Chen, Helen
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 53 - 57
  • [38] Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study
    Yu, Amy Y. X.
    Liu, Zhongyu A.
    Pou-Prom, Chloe
    Lopes, Kaitlyn
    Kapral, Moira K.
    Aviv, Richard, I
    Mamdani, Muhammad
    JMIR MEDICAL INFORMATICS, 2021, 9 (05)
  • [39] Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system
    Fonferko-Shadrach, Beata
    Lacey, Arron S.
    Roberts, Angus
    Akbari, Ashley
    Thompson, Simon
    Ford, David V.
    Lyons, Ronan A.
    Rees, Mark I.
    Pickrell, William Owen
    BMJ OPEN, 2019, 9 (04):
  • [40] Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
    Park, Briton
    Altieri, Nicholas
    DeNero, John
    Odisho, Anobel Y.
    Yu, Bin
    JAMIA OPEN, 2021, 4 (03)