An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback

被引:1
作者
Bashir, Shariq [1 ]
机构
[1] Imam Muhammad Ibn Saud Univ, Informat Management Dept, Coll Comp & Informat Sci, Riyadh 11564, Saudi Arabia
关键词
document clustering; machine learning; information retrieval; pseudo-relevance feedback; query expansion; retrieval bias; retrievability measure;
D O I
10.3390/computers5040029
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cluster-based pseudo-relevance feedback (PRF) is an effective approach for searching relevant documents for relevance feedback. Standard approach constructs clusters for PRF only on the basis of high similarity between retrieved documents. The standard approach works quite well if the retrieval bias of the retrieval model does not create any effect on the retrievability of documents. In our experiments we observed when a collection contains retrieval bias, then high retrievable documents of clusters are frequently retrieved at top positions for most of the queries, and these drift the relevance feedback away from relevant documents. For reducing (retrieval bias) noise, we enhance the standard cluster construction approach by constructing clusters on the basis of high similarity and retrievability. We call this retrievability and cluster-based PRF. This enhanced approach keeps only those documents in the clusters that are not frequently retrieve due to retrieval bias. Although this approach improves the effectiveness, however, it penalizes high retrievable documents even if these documents are most relevant to the clusters. To handle this problem, in a second approach, we extend the basic retrievability concept by mining frequent neighbors of the clusters. The frequent neighbors approach keeps only those documents in the clusters that are frequently retrieved with other neighbors of clusters and infrequently retrieved with those documents that are not part of the clusters. Experimental results show that two proposed extensions are helpful for identifying relevant documents for relevance feedback and increasing the effectiveness of queries.
引用
收藏
页数:20
相关论文
共 40 条
[1]  
[Anonymous], 2015, P 13 INT WORKSHOP CO
[2]   LOCAL FEEDBACK IN FULL-TEXT RETRIEVAL SYSTEMS [J].
ATTAR, R ;
FRAENKEL, AS .
JOURNAL OF THE ACM, 1977, 24 (03) :397-417
[3]  
Azzopardi L, 2010, SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, P889
[4]  
Azzopardi Leif, 2008, P 17 ACM C INFORM KN, P561, DOI [10.1145/1458082.1458157, DOI 10.1145/1458082.1458157]
[5]   On the Relationship Between Query Characteristics and IR Functions Retrieval Bias [J].
Bashir, Shariq ;
Rauber, Andreas .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (08) :1515-1532
[6]  
Bashir S, 2010, LECT NOTES COMPUT SC, V5993, P457, DOI 10.1007/978-3-642-12275-0_40
[7]  
Bashir Shariq, 2009, P 18 ACM C INFORM KN, P1863, DOI DOI 10.1145/1645953.1646250
[8]  
Buckley C., 1998, INFORM PROCESS MANAG, V36, P109
[9]  
Buckley C., 1994, AUTOMATIC QUERY EXPA
[10]   Query-based sampling of text databases [J].
Callan, J ;
Connell, M .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2001, 19 (02) :97-130