Petro NLP: Resources for natural language processing and information extraction for the oil and gas industry

被引：1

作者：

Cordeiro, Fabio Correa ^{[1
]}

da Silva, Patricia Ferreira ^{[2
]}

Tessarollo, Alexandre ^{[2
]}

Freitas, Claudia ^{[3
,4
]}

de Souza, Elvis ^{[3
]}

Gomes, Diogo da Silva Magalhaes ^{[2
]}

Souza, Renato Rocha ^{[1
]}

Coelho, Flavio Codeco ^{[1
]}

机构：

[1] Getulio Vargas Fdn, Praia Botafogo 190, BR-22250900 Rio De Janeiro, Brazil

[2] Petrobras Res & Dev Ctr CENPES, Ave Horacio Macedo 950, BR-21941915 Rio De Janeiro, Brazil

[3] Pontificia Univ Catolica Rio de Janeiro, Rua Marques Sao Vicente 225, BR-22451900 Rio de Janeiro, Brazil

[4] ICMC USP, Ave Trabalhador Sao Carlense 400, BR-13566590 Sao Carlos, Brazil

来源：

COMPUTERS & GEOSCIENCES | 2024年 / 193卷

关键词：

Natural language processing; Information extraction; Ontology; Knowledge graphs; Linguistic corpora;

D O I：

10.1016/j.cageo.2024.105714

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Most companies struggle to find and extract relevant information from their technical documents. In particular, the Oil and Gas (O&G) industry faces the challenge of dealing with large amounts of data hidden within old and new geoscientific reports collected over decades of operation. Making this information available in a structured format can unlock valuable information among these mountains of data, which is crucial to support a wide range of industrial and academic applications. However, most natural language processing resources were built from general domain corpora extracted from the Internet and primarily written in English. This paper presents Petro NLP, a comprehensive set of natural language processing and information extraction resources for the oil and gas industry in Portuguese. We connected an interdisciplinary team of geoscientists, linguists, computer scientists, petroleum engineers, librarians, and ontologists to build a knowledge graph and several annotated corpora. The Petro NLP resources comprise: (i) Petro KGraph- a knowledge graph populated with entities and relations commonly found on technical reports; and (ii) Petrol & ecirc;s, PetroGold, PetroNER, and PetroRE- sets of corpora containing raw text and documents annotated with morphosyntactic labels, named entities, and relations. These resources are fundamental infrastructure for future research in natural language processing and information extraction in the oil industry. Our ongoing research uses these datasets to train and enhance pre-trained machine learning models that automatically extract information from geoscientific technical documents.

引用

页数：13

共 50 条

[21] Improved neural machine translation using Natural Language Processing (NLP)
Sk Hasane Ahammad
Ruth Ramya Kalangi
S. Nagendram
Syed Inthiyaz
P. Poorna Priya
Osama S. Faragallah
Alsharef Mohammad
Mahmoud M. A. Eid
Ahmed Nabih Zaki Rashed
Multimedia Tools and Applications, 2024, 83 : 39335 - 39348
[22] USING NATURAL LANGUAGE PROCESSING FOR AUTOMATIC EXTRACTION OF ONTOLOGY INSTANCES
Faria, Carla
Girardi, Rosario
Serra, Ivo
Macedo, Maria
Maranhao, Djefferson
ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2010, : 278 - 283
[23] Natural language querying of databases: an information extraction approach in the conceptual query language
Owei, V
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2000, 53 (04) : 439 - 492
[24] Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
Song, Gyuseon
Chung, Su Jin
Seo, Ji Yeon
Yang, Sun Young
Jin, Eun Hyo
Chung, Goh Eun
Shim, Sung Ryul
Sa, Soonok
Hong, Moongi Simon
Kim, Kang Hyun
Jang, Eunchan
Lee, Chae Won
Bae, Jung Ho
Han, Hyun Wook
JOURNAL OF CLINICAL MEDICINE, 2022, 11 (11)
[25] Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing
Sivarajkumar, Sonish
Tam, Thomas Yu Chow
Mohammad, Haneef Ahamed
Viggiano, Samuel
Oniani, David
Visweswaran, Shyam
Wang, Yanshan
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (10) : 2217 - 2227
[26] Design of an Image Content Understanding and Information Extraction Algorithm Integrating Natural Language Processing
Pang, Ling
Li, Aihua
TRAITEMENT DU SIGNAL, 2024, 41 (06) : 2839 - 2850
[27] Jurisprudence search in Colombia based on natural language processing (NLP) and Lynked Data
Camilo Ordonez, Cristian
Armando Ordonez, Jose
Ordonez Eraso, Hugo Armando
Urbano, Franco
INGE CUC, 2020, 16 (02)
[28] Teaching Natural Language Processing (NLP) Using Ontology Based Education Design
Rehman, Zobia
Kifor, Stefania
3RD INTERNATIONAL ENGINEERING AND TECHNOLOGY EDUCATION CONFERENCE & 7TH BALKAN REGION CONFERENCE ON ENGINEERING AND BUSINESS EDUCATION, 2015,
[29] Opinion Mining and thought Pattern Classification with Natural Language Processing (NLP) Tools
Naqvi, Sayyada Muntaha Azim
Awais, Muhammad
Saeed, Muhammad Yahya
Mohsin, Muhammad
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (10) : 485 - 493
[30] The State of the Art of Natural Language Processing-A Systematic Automated Review of NLP Literature Using NLP Techniques
Sawicki, Jan
Ganzha, Maria
Paprzycki, Marcin
DATA INTELLIGENCE, 2023, 5 (03) : 707 - 749

← 1 2 3 4 5 →