Petro NLP: Resources for natural language processing and information extraction for the oil and gas industry

被引:1
|
作者
Cordeiro, Fabio Correa [1 ]
da Silva, Patricia Ferreira [2 ]
Tessarollo, Alexandre [2 ]
Freitas, Claudia [3 ,4 ]
de Souza, Elvis [3 ]
Gomes, Diogo da Silva Magalhaes [2 ]
Souza, Renato Rocha [1 ]
Coelho, Flavio Codeco [1 ]
机构
[1] Getulio Vargas Fdn, Praia Botafogo 190, BR-22250900 Rio De Janeiro, Brazil
[2] Petrobras Res & Dev Ctr CENPES, Ave Horacio Macedo 950, BR-21941915 Rio De Janeiro, Brazil
[3] Pontificia Univ Catolica Rio de Janeiro, Rua Marques Sao Vicente 225, BR-22451900 Rio de Janeiro, Brazil
[4] ICMC USP, Ave Trabalhador Sao Carlense 400, BR-13566590 Sao Carlos, Brazil
关键词
Natural language processing; Information extraction; Ontology; Knowledge graphs; Linguistic corpora;
D O I
10.1016/j.cageo.2024.105714
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most companies struggle to find and extract relevant information from their technical documents. In particular, the Oil and Gas (O&G) industry faces the challenge of dealing with large amounts of data hidden within old and new geoscientific reports collected over decades of operation. Making this information available in a structured format can unlock valuable information among these mountains of data, which is crucial to support a wide range of industrial and academic applications. However, most natural language processing resources were built from general domain corpora extracted from the Internet and primarily written in English. This paper presents Petro NLP, a comprehensive set of natural language processing and information extraction resources for the oil and gas industry in Portuguese. We connected an interdisciplinary team of geoscientists, linguists, computer scientists, petroleum engineers, librarians, and ontologists to build a knowledge graph and several annotated corpora. The Petro NLP resources comprise: (i) Petro KGraph- a knowledge graph populated with entities and relations commonly found on technical reports; and (ii) Petrol & ecirc;s, PetroGold, PetroNER, and PetroRE- sets of corpora containing raw text and documents annotated with morphosyntactic labels, named entities, and relations. These resources are fundamental infrastructure for future research in natural language processing and information extraction in the oil industry. Our ongoing research uses these datasets to train and enhance pre-trained machine learning models that automatically extract information from geoscientific technical documents.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Semantic Information Retrieval: A comparative experimental study of NLP Tools and Language Resources for Arabic
    Soudani, Nadia
    Bounhas, Ibrahim
    Slimani, Yahya
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 879 - 887
  • [32] Application of Natural Language Processing for Information Retrieval
    Xi, Su Mei
    Lee, Dae Jong
    Cho, Young Im
    PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 18TH '13), 2013, : 621 - 624
  • [33] The Successful Application of Natural Language Processing for Information
    Ferrandez, Antonio
    Rojas, Yenory
    Peral, Jesus
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2007, 7 (01): : 79 - 85
  • [34] Application of Natural Language Processing in Information Retrieval
    Rojas, Yenory
    Ferrandez, Antonio
    Peral, Jesus
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (34):
  • [35] Nature language processing (NLP) and association rules (AR)-based knowledge extraction for intelligent fault analysis: a case study in semiconductor industry
    Wang, Zhiqiang
    Ezukwoke, Kenneth
    Hoayek, Anis
    Batton-Hubert, Mireille
    Boucher, Xavier
    JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 36 (1) : 357 - 372
  • [36] Fiscal data in text: Information extraction from audit reports using Natural Language Processing
    Beltran, Alejandro
    DATA & POLICY, 2023, 5
  • [37] Natural Language Processing (NLP) in Aviation Safety: Systematic Review of Research and Outlook into the Future
    Yang, Chuyang
    Huang, Chenyu
    AEROSPACE, 2023, 10 (07)
  • [38] Application of Natural Language Processing (NLP) in Detecting and Preventing Suicide Ideation: A Systematic Review
    Arowosegbe, Abayomi
    Oyelade, Tope
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2023, 20 (02)
  • [39] Utilizing Natural Language Processing (NLP) to Evaluate Engagement in Project-Based Learning
    Lee, Sarah Priscilla
    Perez, Melissa Renae
    Worsley, Marcelo Bonilla
    Burgess, Bobbie Dlan
    PROCEEDINGS OF 2018 IEEE INTERNATIONAL CONFERENCE ON TEACHING, ASSESSMENT, AND LEARNING FOR ENGINEERING (TALE), 2018, : 1146 - 1149
  • [40] Use of Natural Language Processing (NLP) to Support Assuring the Internal Validity of Qualitative Research
    Zadeh, Puyan
    PROCEEDINGS OF THE CANADIAN SOCIETY FOR CIVIL ENGINEERING ANNUAL CONFERENCE, VOL 3, CSCE 2023, 2024, 497 : 75 - 86