Is this news article still relevant? Ranking by contemporary relevance in archival search

被引:0
作者
Jatowt, Adam [1 ]
Sato, Mari [2 ]
Draxl, Simon [1 ]
Duan, Yijun [3 ]
Campos, Ricardo [4 ]
Yoshikawa, Masatoshi [5 ]
机构
[1] Univ Innsbruck, Innsbruck, Austria
[2] Kyoto Univ, Kyoto, Japan
[3] AIST, Tokyo, Japan
[4] Univ Beira Interior, LIAAD INESCTEC, Covilha, Portugal
[5] Osaka Seikei Univ, Osaka, Japan
关键词
News archives; Information retrieval; Contemporary relevance; Relevance;
D O I
10.1007/s00799-023-00377-y
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Our civilization creates enormous volumes of digital data, a substantial fraction of which is preserved and made publicly available for present and future usage. Additionally, historical born-analog records are progressively being digitized and incorporated into digital document repositories. While professionals often have a clear idea of what they are looking for in document archives, average users are likely to have no precise search needs when accessing available archives (e.g., through their online interfaces). Thus, if the results are to be relevant and appealing to average people, they should include engaging and recognizable material. However, state-of-the-art document archival retrieval systems essentially use the same approaches as search engines for synchronic document collections. In this article, we develop unique ranking criteria for assessing the usefulness of archived contents based on their estimated relationship with current times, which we call contemporary relevance. Contemporary relevance may be utilized to enhance access to archival document collections, increasing the likelihood that users will discover interesting or valuable material. We next present an effective strategy for estimating contemporary relevance degrees of news articles by utilizing learning to rank approach based on a variety of diverse features, and we then successfully test it on the New York Times news collection. The incorporation of the contemporary relevance computation into archival retrieval systems should enable a new search style in which search results are meant to relate to the context of searchers' times, and by this have the potential to engage the archive users. As a proof of concept, we develop and demonstrate a working prototype of a simplified ranking model that operates on the top of the Portuguese Web Archive portal (arquivo.pt).
引用
收藏
页码:197 / 216
页数:20
相关论文
共 83 条
  • [1] Allan J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P37, DOI 10.1145/290941.290954
  • [2] Allan J., TOPIC DETECTION TRAC
  • [3] Alonso O., 2007, ACM SIGIR FORUM, V41, P35, DOI DOI 10.1145/1328964.1328968
  • [4] [Anonymous], 2016, SYNTHESIS LECT HUMAN, DOI DOI 10.1007/978-3-031-02163-3
  • [5] Arikan I., 2009, WSDM
  • [6] Au Yeung C., 2011, CIKM 11, P1231, DOI [DOI 10.1145/2063576.2063755, 10.1145/2063576.2063755]
  • [7] Berberich Klaus, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P519, DOI 10.1145/1277741.1277831
  • [8] Berberich K., 2013, SIGIR WORKSH
  • [9] Berberich K, 2010, LECT NOTES COMPUT SC, V5993, P13, DOI 10.1007/978-3-642-12275-0_5
  • [10] Brank J., 2017, P SIKDD