ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

被引:4
作者
Oro, Ermelinda [1 ]
Ruffolo, Massimo [2 ]
Sacca, Domenico [1 ]
机构
[1] Univ Calabria, Dept Elect Comp & Syst Sci, I-87036 Arcavacata Di Rende, CS, Italy
[2] Italian Natl Res Council, High Performance Comp & Networking Inst, I-87036 Arcavacata Di Rende, CS, Italy
关键词
Ontology-based information extraction; knowledge representation and reasoning; ontology; semantics; logic programming; attribute grammar; augmented transition network; PDF document; SYSTEM;
D O I
10.1142/S0218213009000354
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction is of paramount importance in several real world applications in the are as of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that posemany issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows these mantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses these mantic of the information to extract and the rules that, inturn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example
引用
收藏
页码:673 / 695
页数:23
相关论文
共 50 条
  • [21] Ontology-based information extraction and information retrieval in health care domain
    Dung, Tran Quoc
    Kameyama, Wataru
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2007, 4654 : 323 - +
  • [22] Ontology-Based Traffic Accident Information Extraction on Twitter In Indonesia
    Rakhmawati, Nur Aini
    Awwab, Yasin
    Najib, Ahmad Choirun
    Irsyad, Ahmad
    [J]. INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2022, 25 (70): : 1 - 12
  • [23] WebOMSIE: An Ontology-Based Multi Source Web Information Extraction
    Younsi, Zineb
    Quafafou, Mohamed
    Ouzegane, Redouane
    Tari, Abdelkamel
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, 2013, 185 : 199 - +
  • [24] An ontology-based information extraction and summarization of multiple news articles
    Venkatachalam S.
    Subbiah L.P.
    Rajendiran R.
    Venkatachalam N.
    [J]. International Journal of Information Technology, 2020, 12 (2) : 547 - 557
  • [25] Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports
    Kai Ma
    Miao Tian
    Yongjian Tan
    Qinjun Qiu
    Zhong Xie
    Rong Huang
    [J]. Journal of Earth Science, 2023, 34 : 1390 - 1405
  • [26] Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports
    Ma, Kai
    Tian, Miao
    Tan, Yongjian
    Qiu, Qinjun
    Xie, Zhong
    Huang, Rong
    [J]. JOURNAL OF EARTH SCIENCE, 2023, 34 (05) : 1390 - 1405
  • [27] Ontology-Based Enhanced Word Embedding for Automated Information Extraction from Geoscience Reports
    Qiu, Qinjun
    Xie, Zhong
    [J]. 2018 26TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS 2018), 2018,
  • [28] Information Extraction from the Web: An Ontology-Based Method using Inductive Logic Programming
    Lima, Rinaldo
    Oliveira, Hilario
    Freitas, Fred
    Espinasse, Bernard
    Pentagrossa, Laura
    [J]. 2013 IEEE 25TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2013, : 741 - 748
  • [29] Ontology-based WOM Extraction Service from Weblogs
    Kawamura, Takahiro
    Nagano, Shinichi
    Mizoguchi, Yumiko
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 2231 - 2236
  • [30] Discovering Inconsistencies in PubMed Abstracts through Ontology-Based Information Extraction
    de Silva, Nisansa
    Dou, Dejing
    Huang, Jingshan
    [J]. ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 362 - 371