RI for IR: Capturing term contexts using random indexing for comprehensive information retrieval

被引:1
作者
Prasath, Rajendra [1 ,2 ]
Sarkar, Sudeshna [1 ]
O’Reilly, Philip [2 ]
机构
[1] Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur
[2] Department of Business Information Systems, University College Cork, Cork
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8856卷
关键词
Cross-lingual information retrieval; Implicit semantic analysis; Random indexing; Retrieval effectiveness; Topic dynamics;
D O I
10.1007/978-3-319-13647-9_12
中图分类号
学科分类号
摘要
In this paper, we present an approach, based on random indexing, to identify semantically related information that effectively disambiguate the user query and improves the retrieval efficiency of news documents. User query terms are expanded based on the terms with similar word senses that are discovered by implicitly considering the “associatedness” of the document context with that of the given query. This type of associatedness is guided by word space models, as described by Kanerva et al.(2000). The word-space model computes the meaning of the terms by implicitly utilizing the distributional patterns (contexts) of words collected over large text data. The distributional patterns represent semantic similarity between words in terms of their spatial proximity in the context space. In this space, words are represented by context vectors whose relative directions are assumed to indicate semantic similarity. Motivated by this distributional hypothesis, words with similar meanings are assumed to have similar contexts. For example, if we observe two words that constantly occur with the same context, we are justified in assuming that they mean similar things. Hence the word space methodology makes semantics computable and the underlying models do not require any linguistic or semantic expertise. Experimental results done on FIRE news collection show that the proposed approach effectively captures the term contexts using higher order term associations across the collection of news documents and use such information to assist the retrieval of documents. © Springer International Publishing Switzerland 2014.
引用
收藏
页码:104 / 112
页数:8
相关论文
empty
未找到相关数据