ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

被引：4

作者：

Oro, Ermelinda ^{[1
]}

Ruffolo, Massimo ^{[2
]}

Sacca, Domenico ^{[1
]}

机构：

[1] Univ Calabria, Dept Elect Comp & Syst Sci, I-87036 Arcavacata Di Rende, CS, Italy

[2] Italian Natl Res Council, High Performance Comp & Networking Inst, I-87036 Arcavacata Di Rende, CS, Italy

来源：

INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS | 2009年 / 18卷 / 05期

关键词：

Ontology-based information extraction; knowledge representation and reasoning; ontology; semantics; logic programming; attribute grammar; augmented transition network; PDF document; SYSTEM;

D O I：

10.1142/S0218213009000354

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Information extraction is of paramount importance in several real world applications in the are as of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that posemany issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows these mantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses these mantic of the information to extract and the rules that, inturn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example

引用

页码：673 / 695

页数：23

共 50 条

[1] Towards a System for Ontology-Based Information Extraction from PDF Documents
Oro, Ermelinda
Ruffolo, Massimo
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008, PT II, PROCEEDINGS, 2008, 5332 : 1482 - 1499
[2] Ontology-Based Hazard Information Extraction from Chinese Food Complaint Documents
Yang, Xiquan
Gao, Rui
Han, Zhengfu
Sui, Xin
ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 155 - 163
[3] Ontology-based design information extraction and retrieval
Li, Zhanjun
Ramani, Karthik
AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2007, 21 (02): : 137 - 154
[4] Ontology-Based Information Retrieval for Historical Documents
Ramli, Fatihah
Noah, Shahrul Azman
Kurniawan, Tri Basuki
2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 55 - 59
[5] Ontology-Based Information Extraction from Spanish Forum
Pena, Willy
Melgar, Andres
COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT I, 2015, 9329 : 351 - 360
[6] Ontology-Based Web Information Extraction
Mo, Qian
Chen, Yi-hong
COMMUNICATIONS AND INFORMATION PROCESSING, PT 1, 2012, 288 : 118 - 126
[7] Ontology-based information retrieval and extraction
Lee, CY
Soo, VW
ITRE 2005: 3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: RESEARCH AND EDUCATION, PROCEEDINGS, 2005, : 265 - 269
[8] Ontology-based information extraction from the World Wide Web
Korst, Jan
Geleijnse, Gijs
de Jong, Nick
Verschoor, Michael
INTELLIGENT ALGORITHMS IN AMBIENT AND BIOMEDICAL COMPUTING, 2006, 7 : 149 - +
[9] Ontology-based automated information extraction from building energy conservation codes
Zhou, Peng
El-Gohary, Nora
AUTOMATION IN CONSTRUCTION, 2017, 74 : 103 - 117
[10] A hybrid ontology-based information extraction system
Gutierrez, Fernando
Dou, Dejing
Fickas, Stephen
Wimalasuriya, Daya
Zong, Hui
JOURNAL OF INFORMATION SCIENCE, 2016, 42 (06) : 798 - 820

← 1 2 3 4 5 →