Petro NLP: Resources for natural language processing and information extraction for the oil and gas industry

被引：0

作者：

Cordeiro, Fabio Correa ^{[1
]}

da Silva, Patricia Ferreira ^{[2
]}

Tessarollo, Alexandre ^{[2
]}

Freitas, Claudia ^{[3
,4
]}

de Souza, Elvis ^{[3
]}

Gomes, Diogo da Silva Magalhaes ^{[2
]}

Souza, Renato Rocha ^{[1
]}

Coelho, Flavio Codeco ^{[1
]}

机构：

[1] Getulio Vargas Fdn, Praia Botafogo 190, BR-22250900 Rio De Janeiro, Brazil

[2] Petrobras Res & Dev Ctr CENPES, Ave Horacio Macedo 950, BR-21941915 Rio De Janeiro, Brazil

[3] Pontificia Univ Catolica Rio de Janeiro, Rua Marques Sao Vicente 225, BR-22451900 Rio de Janeiro, Brazil

[4] ICMC USP, Ave Trabalhador Sao Carlense 400, BR-13566590 Sao Carlos, Brazil

来源：

COMPUTERS & GEOSCIENCES | 2024年 / 193卷

关键词：

Natural language processing; Information extraction; Ontology; Knowledge graphs; Linguistic corpora;

D O I：

10.1016/j.cageo.2024.105714

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Most companies struggle to find and extract relevant information from their technical documents. In particular, the Oil and Gas (O&G) industry faces the challenge of dealing with large amounts of data hidden within old and new geoscientific reports collected over decades of operation. Making this information available in a structured format can unlock valuable information among these mountains of data, which is crucial to support a wide range of industrial and academic applications. However, most natural language processing resources were built from general domain corpora extracted from the Internet and primarily written in English. This paper presents Petro NLP, a comprehensive set of natural language processing and information extraction resources for the oil and gas industry in Portuguese. We connected an interdisciplinary team of geoscientists, linguists, computer scientists, petroleum engineers, librarians, and ontologists to build a knowledge graph and several annotated corpora. The Petro NLP resources comprise: (i) Petro KGraph- a knowledge graph populated with entities and relations commonly found on technical reports; and (ii) Petrol & ecirc;s, PetroGold, PetroNER, and PetroRE- sets of corpora containing raw text and documents annotated with morphosyntactic labels, named entities, and relations. These resources are fundamental infrastructure for future research in natural language processing and information extraction in the oil industry. Our ongoing research uses these datasets to train and enhance pre-trained machine learning models that automatically extract information from geoscientific technical documents.

引用

页数：13

共 50 条

[1] Extraction of Activities Information from Construction Contracts Using Natural Language Processing (NLP) Methods to Support Scheduling
ul Hassan, Fahad
Tuyen Le
CONSTRUCTION RESEARCH CONGRESS 2022: COMPUTER APPLICATIONS, AUTOMATION, AND DATA ANALYTICS, 2022, : 773 - 781
[2] From NLP (Natural Language Processing) to MLP (Machine Language Processing)
Teufl, Peter
Payer, Udo
Lackner, Guenter
COMPUTER NETWORK SECURITY, 2010, 6258 : 256 - +
[3] From semantics to pragmatics: where IS can lead in Natural Language Processing (NLP) research
Li, Yan
Thomas, Manoj A.
Liu, Dapeng
EUROPEAN JOURNAL OF INFORMATION SYSTEMS, 2021, 30 (05) : 569 - 590
[4] The parallel corpus for information extraction based on natural language processing and machine translation
He, Honghua
EXPERT SYSTEMS, 2019, 36 (05)
[5] The application of natural language processing for the extraction of mechanistic information in toxicology
Corradi, Marie
Luechtefeld, Thomas
de Haan, Alyanne M.
Pieters, Raymond
Freedman, Jonathan H.
Vanhaecke, Tamara
Vinken, Mathieu
Teunis, Marc
FRONTIERS IN TOXICOLOGY, 2024, 6
[6] Syntactic and semantic information extraction from NPP procedures utilizing natural language processing integrated with rules
Choi, Yongsun
Minh Duc Nguyen
Kerr, Thomas N., Jr.
NUCLEAR ENGINEERING AND TECHNOLOGY, 2021, 53 (03) : 866 - 878
[7] A Systematic Literature Review on Natural Language Processing (NLP)
Castanha, Jick
Indrawati
Pillai, Subhash K. B.
Ramantoko, Gadang
Widarmanti, Tri
2022 INTERNATIONAL CONFERENCE ON ADVANCED CREATIVE NETWORKS AND INTELLIGENT SYSTEMS, ICACNIS, 2022, : 130 - 135
[8] Extracting Business Process Models using Natural Language Processing (NLP) Techniques
Sintoris, Konstantinos
Vergidis, Kostas
2017 IEEE 19TH CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL 1, 2017, 1 : 135 - 139
[9] NLP4KGC: Natural Language Processing for Knowledge Graph Construction
Vakaj, Edlira
Tiwari, Sanju
Mihindukulasooriya, Nandana
Ortiz-Rodriguez, Fernando
Mcgranaghan, Ryan
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1111 - 1111
[10] Natural Language Processing approach to NLP Meta model automation
Amirhosseini, Mohammad Hossein
Kazemian, Hassan B.
Ouazzane, Karim
Chandler, Chris
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 186 - 193

← 1 2 3 4 5 →