ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

被引:4
|
作者
Oro, Ermelinda [1 ]
Ruffolo, Massimo [2 ]
Sacca, Domenico [1 ]
机构
[1] Univ Calabria, Dept Elect Comp & Syst Sci, I-87036 Arcavacata Di Rende, CS, Italy
[2] Italian Natl Res Council, High Performance Comp & Networking Inst, I-87036 Arcavacata Di Rende, CS, Italy
关键词
Ontology-based information extraction; knowledge representation and reasoning; ontology; semantics; logic programming; attribute grammar; augmented transition network; PDF document; SYSTEM;
D O I
10.1142/S0218213009000354
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction is of paramount importance in several real world applications in the are as of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that posemany issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows these mantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses these mantic of the information to extract and the rules that, inturn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example
引用
收藏
页码:673 / 695
页数:23
相关论文
共 50 条
  • [1] Towards a System for Ontology-Based Information Extraction from PDF Documents
    Oro, Ermelinda
    Ruffolo, Massimo
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008, PT II, PROCEEDINGS, 2008, 5332 : 1482 - 1499
  • [2] Ontology-Based Hazard Information Extraction from Chinese Food Complaint Documents
    Yang, Xiquan
    Gao, Rui
    Han, Zhengfu
    Sui, Xin
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 155 - 163
  • [3] Ontology-based design information extraction and retrieval
    Li, Zhanjun
    Ramani, Karthik
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2007, 21 (02): : 137 - 154
  • [4] Ontology-Based Information Retrieval for Historical Documents
    Ramli, Fatihah
    Noah, Shahrul Azman
    Kurniawan, Tri Basuki
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 55 - 59
  • [5] Ontology-Based Information Extraction from Spanish Forum
    Pena, Willy
    Melgar, Andres
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT I, 2015, 9329 : 351 - 360
  • [6] Ontology-Based Web Information Extraction
    Mo, Qian
    Chen, Yi-hong
    COMMUNICATIONS AND INFORMATION PROCESSING, PT 1, 2012, 288 : 118 - 126
  • [7] Ontology-based information retrieval and extraction
    Lee, CY
    Soo, VW
    ITRE 2005: 3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: RESEARCH AND EDUCATION, PROCEEDINGS, 2005, : 265 - 269
  • [8] Ontology-based information extraction from the World Wide Web
    Korst, Jan
    Geleijnse, Gijs
    de Jong, Nick
    Verschoor, Michael
    INTELLIGENT ALGORITHMS IN AMBIENT AND BIOMEDICAL COMPUTING, 2006, 7 : 149 - +
  • [9] Ontology-based automated information extraction from building energy conservation codes
    Zhou, Peng
    El-Gohary, Nora
    AUTOMATION IN CONSTRUCTION, 2017, 74 : 103 - 117
  • [10] A hybrid ontology-based information extraction system
    Gutierrez, Fernando
    Dou, Dejing
    Fickas, Stephen
    Wimalasuriya, Daya
    Zong, Hui
    JOURNAL OF INFORMATION SCIENCE, 2016, 42 (06) : 798 - 820