RI for IR: Capturing term contexts using random indexing for comprehensive information retrieval

被引：1

作者：

Prasath, Rajendra ^{[1
,2
]}

Sarkar, Sudeshna ^{[1
]}

O’Reilly, Philip ^{[2
]}

机构：

[1] Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur

[2] Department of Business Information Systems, University College Cork, Cork

来源：

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8856卷

关键词：

Cross-lingual information retrieval; Implicit semantic analysis; Random indexing; Retrieval effectiveness; Topic dynamics;

D O I：

10.1007/978-3-319-13647-9_12

中图分类号：

学科分类号：

摘要：

In this paper, we present an approach, based on random indexing, to identify semantically related information that effectively disambiguate the user query and improves the retrieval efficiency of news documents. User query terms are expanded based on the terms with similar word senses that are discovered by implicitly considering the “associatedness” of the document context with that of the given query. This type of associatedness is guided by word space models, as described by Kanerva et al.(2000). The word-space model computes the meaning of the terms by implicitly utilizing the distributional patterns (contexts) of words collected over large text data. The distributional patterns represent semantic similarity between words in terms of their spatial proximity in the context space. In this space, words are represented by context vectors whose relative directions are assumed to indicate semantic similarity. Motivated by this distributional hypothesis, words with similar meanings are assumed to have similar contexts. For example, if we observe two words that constantly occur with the same context, we are justified in assuming that they mean similar things. Hence the word space methodology makes semantics computable and the underlying models do not require any linguistic or semantic expertise. Experimental results done on FIRE news collection show that the proposed approach effectively captures the term contexts using higher order term associations across the collection of news documents and use such information to assist the retrieval of documents. © Springer International Publishing Switzerland 2014.

引用

页码：104 / 112

页数：8