MEASURING THE RELIABILITY OF RANKING IN INFORMATION RETRIEVAL SYSTEMS EVALUATION

被引：0

作者：

Rajagopal, Prabha ^{[1
]}

Ravana, Sri Devi ^{[1
]}

机构：

[1] Univ Malaya, Kuala Lumpur 50603, Malaysia

来源：

MALAYSIAN JOURNAL OF COMPUTER SCIENCE | 2019年 / 32卷 / 04期

关键词：

Information Retrieval; System Evaluation; Reliability Testing; Intraclass Correlation Coefficient; TREC; Information Systems;

D O I：

10.22452/mjcs.vol32no4.1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A reliable system is crucial in satisfying users' need, but the reliability is dependent on the varying effects of the test collection. The reliability is usually evaluated by the similarities of a set of system rankings to understand the impact of variations in relevance to judgments or effectiveness metrics. However, such evaluations do not indicate the reliability of individual system rankings. This study proposes a method to measure the reliability of individual retrieval systems based on their relative rankings. The Intraclass Correlation Coefficient (ICC) is used as a reliability measure of individual system ranks. Various combination of effectiveness metrics according to their clusters, selection of topic sizes, and Kendall's tau correlation coefficient with the gold standard are experimented. The metrics average precision (AP) and rank-biased precision (RBP) are suitable for measuring the reliability of system rankings and generalizing the outcome with other similar metrics. Highly reliable system rankings belong mostly to the top and mid performing systems and are strongly correlated with the gold standard system ranks. The proposed method can be replicated to other test collections as it utilizes relative ranking in measuring reliability. The study measures the ranking reliability of individual retrieval systems to indicate the level of reliability a user can consume from the retrieval system regardless of its performance.

引用

页码：253 / 268

页数：16

共 29 条

[1]

ALGINA J, 1978, PSYCHOL BULL, V85, P135

[2]

[Anonymous], 2016, INTERRATER RELIABILI

[3] How many performance measures to evaluate information retrieval systems? [J].

Baccini, Alain ;

Dejean, Sebastien ;

Lafage, Laetitia ;

Mothe, Josiane .

KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (03) :693-713

[4] User Variability and IR System Evaluation [J].

Bailey, Peter ;

Moffat, Alistair ;

Scholer, Falk ;

Thomas, Paul .

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, :625-634

[5]

Barrett P., 2001, ASSESSING RELIABILIT

[6] INTRACLASS CORRELATION COEFFICIENT AS A MEASURE OF RELIABILITY [J].

BARTKO, JJ .

PSYCHOLOGICAL REPORTS, 1966, 19 (01) :3-&

[7] VARIOUS INTRACLASS CORRELATION RELIABILITY COEFFICIENTS [J].

BARTKO, JJ .

PSYCHOLOGICAL BULLETIN, 1976, 83 (05) :762-765

[8]

Blanco R., 2013, Journal of Web Semantics, V21, P923, DOI [10.1016/j.websem.2013.05.005, DOI 10.1016/J.WEBSEM.2013.05.005]

[9]

Chua Y.P., 2013, MASTERING RES STAT

[10]

de Melo Gerard, 2013, Advances in Information Retrieval. 35th European Conference on IR Research, ECIR 2013. Proceedings, P869, DOI 10.1007/978-3-642-36973-5_105

← 1 2 3 →