Open Information Extraction as Additional Source for Kazakh Ontology Generation

被引：1

作者：

Khairova, Nina ^{[1
]}

Petrasova, Svitlana ^{[1
]}

Mamyrbayev, Orken ^{[2
]}

Mukhsina, Kuralay ^{[3
]}

机构：

[1] Natl Tech Univ Kharkiv Polytech Inst, Kyrpychova St, UA-61002 Kharkiv, Ukraine

[2] Inst Informat & Computat Technol, 125 Pushkin St, Alma Ata 050010, Kazakhstan

[3] Al Farabi Kazakh Natl Univ, 71 Al Farabi Ave, Alma Ata, Kazakhstan

来源：

INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I | 2020年 / 12033卷

关键词：

Open Information Extraction; RDF-triplets; Unstructured text; Logical-linguistic equations; Kazakh bilingual news websites;

D O I：

10.1007/978-3-030-41964-6_8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, structured information that obtains from unstructured texts and Web context can be applied as an additional source of knowledge to create ontologies. In order to extract information from a text and represent it in the RDF-triplets format, we suggest using the Open Information Extraction model. Then we consider the adaptation of the model to fact extraction from unstructured texts in the Kazakh language. In our approach, we identify lexical units that name the participants of the action (the Subject and Object) and semantic relations between them based on words characteristics in a sentence. The model provides semantic functions of the action participants via logical-linguistic equations that express the relations of the grammatical and semantic characteristics of the words in a Kazakh sentence. Using the tag names and some syntactic characteristics of words in the Kazakh sentences as the values of the predicate variables in corresponding equations allows us to extract Subjects, Objects and Predicates of facts from texts of Web content. The experimental research dataset includes texts extracted from Kazakh bilingual news websites. The experiment shows that we can achieve the precision of facts extraction over 71% for Kazakh corpus.

引用

页码：86 / 96

页数：11

共 23 条

[1]

Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344

[2]

[Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772814

[3]

[Anonymous], 2012, P JOINT WORKSHOP AUT

[4]

ARPA, 1991, P 3 MESS UND C

[5]

Duc-Thuan V, 2016, ENCY SEMANTIC COMPUT, V1, P1

[6] Open Information Extraction from the Web [J].

Etzioni, Oren ;

Banko, Michele ;

Soderland, Stephen ;

Weld, Daniel S. .

COMMUNICATIONS OF THE ACM, 2008, 51 (12) :68-74

[7]

Fader A, 2011, P C EMP METH NAT LAN, P1535

[8] Multilingual Open Information Extraction [J].

Gamallo, Pablo ;

Garcia, Marcos .

PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 :711-722

[9]

Gamallo Pablo., 2012, Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, P10

[10]

Gashteovski K., 2017, P 2017 C EMP METH NA, P2630, DOI [DOI 10.18653/V1/D17-1278, 10.18653/v1/d17-1278]

← 1 2 3 →