Recent and Robust Query Auto-Completion

被引:46
作者
Whiting, Stewart [1 ]
Jose, Joemon M. [1 ]
机构
[1] Univ Glasgow, Sch Comp Sci, Glasgow, Lanark, Scotland
来源
WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2014年
关键词
Time; temporal; recency; query; text; completion; auto-completion;
D O I
10.1145/2566486.2568009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Query auto-completion (QAC) is a common interactive feature that assists users in formulating queries by providing completion suggestions as they type. In order for QAC to minimise the user's cognitive and physical effort, it must: (i) suggest the user's intended query after minimal input keystrokes, and (ii) rank the user's intended query highly in completion suggestions. Typically, QAC approaches rank completion suggestions by their past popularity. Accordingly, QAC is usually very effective for previously seen and consistently popular queries. Users are increasingly turning to search engines to find out about unpredictable emerging and ongoing events and phenomena, often using previously unseen or unpopular queries. Consequently, QAC must be both robust and time-sensitive - that is, able to sufficiently rank both consistently and recently popular queries in completion suggestions. To address this trade-off, we propose several practical completion suggestion ranking approaches, including: (i) a sliding window of query popularity evidence from the past 2-28 days, (ii) the query popularity distribution in the last N queries observed with a given prefix, and (iii) short-range query popularity prediction based on recently observed trends. Using real-time simulation experiments, we extensively investigated the parameters necessary to maximise QAC effectiveness for three openly available query log datasets with prefixes of 2-5 characters: MSN and AOL (both English), and Sogou 2008 (Chinese). Optimal parameters vary for each query log, capturing the differing temporal dynamics and querying distributions. Results demonstrate consistent and language-independent improvements of up to 9.2% over a non-temporal QAC baseline for all query logs with prefix lengths of 2-3 characters. This work is an important step towards more effective QAC approaches.
引用
收藏
页码:971 / 981
页数:11
相关论文
共 31 条
[1]  
Adar E., 2007, P 16 INT C WORLD WID, P161, DOI [DOI 10.1145/1242572.1242595, 10.1145/1242572.1242595]
[2]  
[Anonymous], 1997, ART COMPUTER PROGRAM
[3]  
[Anonymous], 2009, P 2009 WORKSH WEB SE
[4]  
[Anonymous], 2012, P 21 INT C WORLD WID
[5]  
[Anonymous], 2011, P 20 INT C WORLD WID, DOI DOI 10.1145/1963405.1963424
[6]  
[Anonymous], 2006, INFOSCALE 06
[7]  
[Anonymous], 2012, P 21 INT C WORLD WID
[8]   Fast Construction of the HYB Index [J].
Bast, Hannah ;
Celikik, Marjan .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2011, 29 (03)
[9]   Temporal analysis of a very large topically categorized Web query log [J].
Beitzel, Steven M. ;
Jensen, Eric C. ;
Chowdhury, Abdur ;
Frieder, Ophir ;
Grossman, David .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (02) :166-178
[10]  
Bhatia S, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P795