Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems

被引：5

作者：

Gonzalez-Saez, Gabriela Nicole ^{[1
]}

Mulhem, Philippe ^{[1
]}

Goeuriot, Lorraine ^{[1
]}

机构：

[1] Univ Grenoble Alpes, Inst Engn, CNRS, Grenoble INP,LIG, F-38000 Grenoble, France

来源：

EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2021 | 2021年 / 12880卷

基金：

奥地利科学基金会;

关键词：

Information retrieval evaluation; Test collection; Result delta;

D O I：

10.1007/978-3-030-85251-1_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Evaluation of information retrieval systems follows the Cranfield paradigm, where the evaluation of several IR systems relies on a common evaluation environment (test collection and evaluation settings). The Cranfield paradigm requires the evaluation environment (EE) to be strictly identical to compare system's performances. For those cases where such paradigm cannot be used, e.g. when we do not have access to the code of the systems, we consider an evaluation framework that allows for slight changes in the EEs, as the evolution of the document corpus or topics. To do so, we propose to compare systems evaluated on different environments using a reference system, called pivot. In this paper, we present and validate a method to select a pivot, which is used to construct a correct ranking of systems evaluated in different environments. We test our framework on the TREC-COVID test collection, which is composed of five rounds of growing topics, documents and relevance judgments. The results of our experiments show that the pivot strategy can propose a correct ranking of systems evaluated in an evolving test collection.

引用

页码：91 / 102

页数：12

共 17 条

[1]

[Anonymous], 2003, Probabilistic models for information retrieval based on divergence from randomness

[2] Improving the Accuracy of System Performance Estimation by Using Shards [J].

Ferro, Nicola ;

Sanderson, Mark .

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :805-814

[3] Sub-corpora Impact on System Effectiveness [J].

Ferro, Nicola ;

Sanderson, Mark .

SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :901-904

[4] Repeatable evaluation of search services in dynamic environments [J].

Jensen, Eric C. ;

Beitzel, Steven M. ;

Chowdhury, Abdur ;

Frieder, Ophir .

ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (01)

[5]

Ounis I., 2007, CEPIS Upgrade J., V8

[6]

Sakai T, 2016, PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2016, P95, DOI 10.1145/2970398.2970399

[7]

Sanderson M., 2012, P 21 ACM INT C INF K, P1965, DOI DOI 10.1145/2396761.2398553

[8]

Soboroff I., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P276, DOI 10.1145/1148170.1148220

[9] Meta-Analysis for Retrieval Experiments Involving Multiple Test Collections [J].

Soboroff, Ian .

CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, :713-722

[10] Pooling-based continuous evaluation of information retrieval systems [J].

Tonon, Alberto ;

Demartini, Gianluca ;

Cudre-Mauroux, Philippe .

INFORMATION RETRIEVAL JOURNAL, 2015, 18 (05) :445-472

← 1 2 →