A novel semantic information retrieval system based on a three-level domain model

被引:11
作者
Sbattella, Licia [1 ]
Tedesco, Roberto [2 ]
机构
[1] Politecn Milan, Dipartimento Elettron & Informaz, I-20133 Milan, Italy
[2] Politecn Milan, MultiChancePoliTeam, I-20133 Milan, Italy
关键词
HMM; MaxEnt; Ontology; WordNet; Semantic information retrieval; Word sense disambiguation; ONTOLOGY; EXTRACTION; KAPPA;
D O I
10.1016/j.jss.2013.01.029
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents a methodology and a prototype for extracting and indexing knowledge from natural language documents. The underlying domain model relies on a conceptual level (described by means of a domain ontology), which represents the domain knowledge, and a lexical level (based on WordNet), which represents the domain vocabulary. A stochastic model (the ME-2L-HMM2, which mixes - in a novel way - HMM and maximum entropy models) stores the mapping between such levels, taking into account the linguistic context of words. Not only does such a context contain the surrounding words; it also contains morphologic and syntactic information extracted using natural language processing tools. The stochastic model is then used, during the document indexing phase, to disambiguate word meanings. The semantic information retrieval engine we developed supports simple keyword-based queries, as well as natural language-based queries. The engine is also able to extend the domain knowledge, discovering new and relevant concepts to add to the domain model. The validation tests indicate that the system is able to disambiguate and extract concepts with good accuracy. A comparison between our prototype and a classic search engine shows that the proposed approach is effective in providing better accuracy. (c) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:1426 / 1452
页数:27
相关论文
共 57 条
  • [1] Automatic ontology-based knowledge extraction from web documents
    Alani, H
    Kim, S
    Millard, DE
    Weal, MJ
    Hall, W
    Lewis, PH
    Shadbolt, NR
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) : 14 - 21
  • [2] Allison B, 2006, LECT NOTES ARTIF INT, V4188, P327
  • [3] Anastasi G., 2012, LECT NOTES COMPUTER, V7200
  • [4] [Anonymous], 2003, DESCRIPTION LOGIC HD
  • [5] [Anonymous], CSLI LECT NOTES SERI
  • [6] [Anonymous], 2008, Tech. Rep.
  • [7] [Anonymous], 2006, P INT C LANG RES EV
  • [8] [Anonymous], 2004, MSCIS0421 U PENNS
  • [9] [Anonymous], 2005, P JOENS U LEARN INST
  • [10] [Anonymous], P M ASS COMP LING AC