Identifying top relevant dates for implicit time sensitive queries

被引:14
作者
Campos, Ricardo [1 ,2 ]
Dias, Gael [3 ]
Jorge, Alipio Mario [2 ,4 ]
Nunes, Celia [5 ,6 ]
机构
[1] Polytech Inst Tomar, ICT Dept, Tomar, Portugal
[2] INESC Technol & Sci, INESC TEC, LIAAD, Oporto, Portugal
[3] Univ Caen Basse Normandie, HULTECH GREYC, Caen, France
[4] Univ Porto, Fac Sci, DCC, Oporto, Portugal
[5] Univ Beira Interior, Dept Math, Covilha, Portugal
[6] Univ Beira Interior, Ctr Math & Applicat, Covilha, Portugal
来源
INFORMATION RETRIEVAL JOURNAL | 2017年 / 20卷 / 04期
关键词
Temporal information retrieval; Implicit time sensitive queries; Temporal query understanding; Relevant temporal expressions;
D O I
10.1007/s10791-017-9302-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., "philip seymour hoffman'') search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.
引用
收藏
页码:363 / 398
页数:36
相关论文
共 51 条
[1]  
Alonso O., 2009, WWW 09 WORKSH WEB SE
[2]  
[Anonymous], 2015, P 2015 C EMPIRICAL M, DOI DOI 10.18653/V1/D15-1063
[3]  
[Anonymous], 2011, P 20 INT C WORLD WID
[4]  
[Anonymous], 2005, DATA MINING
[5]  
[Anonymous], 2004, Proceedings of the 2004 ACM SIGMOD international conference on Management of data
[6]  
[Anonymous], 2007, P 16 INT WORLD WID W, DOI DOI 10.1145/1242572.1242675
[7]  
[Anonymous], 2009, P 18 ACM C INFORM KN, DOI [10.1145/1645953.1645968, DOI 10.1145/1645953.1645968]
[8]  
Brucato Matteo, 2014, Advances in Information Retrieval. 36th European Conference on IR Research, ECIR 2014. Proceedings: LNCS 8416, P385, DOI 10.1007/978-3-319-06028-6_32
[9]  
Campos Ricardo, 2014, Advances in Information Retrieval. 36th European Conference on IR Research, ECIR 2014. Proceedings: LNCS 8416, P775, DOI 10.1007/978-3-319-06028-6_94
[10]  
Campos R., 2011, P QRU 11 WORKSH ASS, P13