ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

被引:4
作者
Oro, Ermelinda [1 ]
Ruffolo, Massimo [2 ]
Sacca, Domenico [1 ]
机构
[1] Univ Calabria, Dept Elect Comp & Syst Sci, I-87036 Arcavacata Di Rende, CS, Italy
[2] Italian Natl Res Council, High Performance Comp & Networking Inst, I-87036 Arcavacata Di Rende, CS, Italy
关键词
Ontology-based information extraction; knowledge representation and reasoning; ontology; semantics; logic programming; attribute grammar; augmented transition network; PDF document; SYSTEM;
D O I
10.1142/S0218213009000354
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction is of paramount importance in several real world applications in the are as of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that posemany issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows these mantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses these mantic of the information to extract and the rules that, inturn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example
引用
收藏
页码:673 / 695
页数:23
相关论文
共 50 条
[41]   Ontology-Based Information Extraction for Subject-Focussed Automatic Essay Evaluation [J].
Ajetunmobi, Stephanie Abimbola ;
Daramola, Olawande .
PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTING NETWORKING AND INFORMATICS (ICCNI 2017), 2017,
[42]   Exploiting geographic references of documents in a geographical information retrieval system using an ontology-based index [J].
Nieves R. Brisaboa ;
Miguel R. Luaces ;
Ángeles S. Places ;
Diego Seco .
GeoInformatica, 2010, 14 :307-331
[43]   Ontology-based information extraction of regulatory networks from scientific articles with case studies for Escherichia coli [J].
Moreno, Antonio ;
Isern, David ;
Lopez Fuentes, Alejandra C. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (08) :3266-3281
[44]   Exploiting geographic references of documents in a geographical information retrieval system using an ontology-based index [J].
Brisaboa, Nieves R. ;
Luaces, Miguel R. ;
Places, Angeles S. ;
Seco, Diego .
GEOINFORMATICA, 2010, 14 (03) :307-331
[45]   Logical foundations of information disclosure in ontology-based data integration [J].
Benedikt, Michael ;
Grau, Bernardo Cuenca ;
Kostylev, Egor V. .
ARTIFICIAL INTELLIGENCE, 2018, 262 :52-95
[46]   Ontology-Based Semantic Annotation of French Psychiatric Clinical Documents [J].
Aouina, Ons ;
Hilbey, Jacques ;
Charlet, Jean .
CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 :793-797
[47]   Ontology-based mappings [J].
Mecca, Giansalvatore ;
Rull, Guillem ;
Santoro, Donatello ;
Teniente, Ernest .
DATA & KNOWLEDGE ENGINEERING, 2015, 98 :8-29
[48]   An Ontology-Based Representation Architecture of Unstructured Information [J].
GU Jinguang CHEN Heping CHEN Xinmeng School of Computer Wuhan University Wuhan Hubei ChinaCollege of Computer Science and Technology Wuhan University of Science and Technology Wuhan Hubei ChinaCollege of Information Science and Engineering Wuhan University of Science and Technology Wuhan Hubei China .
WuhanUniversityJournalofNaturalSciences, 2004, (05) :595-600
[49]   Ontology-Based Temporal Modelling of Provenance Information [J].
Mikroyannidis, Alexander ;
Ong, Bee ;
Ng, Kia ;
Giaretta, David .
2008 IEEE MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, VOLS 1 AND 2, 2008, :170-+
[50]   Ontology-based Geographic Information System for Environment [J].
Zhang Zeliang ;
Wang Danping ;
Yang Chengjia .
6TH INTERNATIONAL SYMPOSIUM OF ASIA INSTITUTE OF URBAN ENVIRONMENT: ENERGY CONSERVATION AND CARBON OFF IN ASIA CITY, 2009, :164-168