Semantic clustering based relevance language model

被引:0
作者
Pu Q. [2 ]
He D. [1 ]
机构
[1] School of Information Sciences, University of Pittsburgh, Pittsburgh
[2] School of Compuler Science and Engineering, University of Electronic Science and Technology of China, 610054, Chengdu, Sichuan
关键词
Independent component analysis; Information retrieval; Pseudo relevance feedback; Query expansion; Relevance language model; Semantic clustering;
D O I
10.3923/itj.2010.236.246
中图分类号
学科分类号
摘要
How to effectively generate clusters and use the information in clusters to improve information retrieval performance are still open research questions. By viewing a document as an interaction of a set of independent hidden topics, we propose a novel semantic clustering technique using independent component analysis. Then within language modeling framework, we apply the obtained semantic topic clusters into the estimation process of relevance model. We expect that semantic clustering will filter out those noisy documents so that the estimation of relevance model is only based on relevant documents and some useful semantic information. A semantic cluster is activated to be the most similar to a user's information need by user's query, the documents in the activated semantic cluster and the keywords of representing the activated semantic cluster are used for the estimation of relevance model. Therefore, we obtain a semantic cluster based relevance language model that uses pseudo relevance feedback technique without requiring any relevance training information. We applied the model in experiments on five TREC data sets. The experiment results show that our model can significantly improve retrieval performance over previous language models including relevancebased language models. We think that the main contribution of the improved performance comes from the estimation of relevance model on the semantic cluster that is closely related to a user's information need. © 2010 Asian Network for Scientific Information.
引用
收藏
页码:236 / 246
页数:10
相关论文
共 17 条
[1]  
Amari S.I., Natural gradient works efficiently in learning, Neural Compul, 10, pp. 251-276, (1998)
[2]  
Deerwester S., Umais S.T., Furnas G.W., Landauer T.K., Harshman R., Indexing by latent semantic analysis, J. Am. Soc. Inform. Sci. Technol, 41, pp. 391-407, (1990)
[3]  
Efthimiadis E.N., Interactive query expansion: A user-based evaluation in a relevance feedback environment, J. Am. Soc. Inform. Sci. Technol, 51, pp. 989-1003, (2000)
[4]  
Hansen L.K., Larsen J., Kolenda T., Blind detection of independent dynamic components, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3197-3200, (2001)
[5]  
Hofmann T., Probabilistic latent semantic indexing, Proceedings of the 22th ACM SIGIR Conference on Research and Development in Information Retrieval, (1999)
[6]  
Kolenda T., Adaptive Tools in Virtual Environments: Independent Component Analysis for Multimedia. Informatics and Mathematical Modelling, (2002)
[7]  
Kolenda T., Hansen L.K., Winther O., Sigurdsson S., Dtu: Toolbox, Informatics and Mathematical Modeling, (2002)
[8]  
Lafferty J., Zhai C., Document language models, query models and risk minimization for information retrieval, Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111-119, (2001)
[9]  
Lautrup B., Hansen L.K., Law I., Morch N., Svarer C., Strother S.C., Massive weight sharing: A cure for extremely ill-posed problems, Proceedings of the Workshop on Supercomputing in Brain Research: From Tomography to Neural Networks. (WSBRFTNN'95), pp. 137-148, (1995)
[10]  
Lavrenko V., Croft W.B., Relevance-based language models, Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120-127, (2001)