A General Theory of IR Evaluation Measures

被引:15
作者
Ferrante, Marco [1 ]
Ferro, Nicola [2 ]
Pontarollo, Silvia [1 ]
机构
[1] Univ Padua, Dept Math, I-35122 Padua, Italy
[2] Univ Padua, Dept Informat Engn, I-35122 Padua, Italy
关键词
Representational theory of measurement; interval scale; IR evaluation measure; formal framework; RELEVANCE;
D O I
10.1109/TKDE.2018.2840708
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Interval scales are assumed by several basic descriptive statistics, such as mean and variance, and by many statistical significance tests which are daily used in IR to compare systems. Unfortunately, so far, there has not been any systematic and formal study to discover the actual scale properties of IR measures. Therefore, in this paper, we develop a theory of Information Retrieval (IR) evaluation measures, based on the representational theory of measurements, to determine whether and when IR measures are interval scales. We found that common set-based retrieval measures-namely Precision, Recall, and F-measure-always are interval scales in the case of binary relevance while this happens also in the case of multi-graded relevance only when the relevance degrees themselves are on a ratio scale and we define a specific partial order among systems. In the case of rank-based retrieval measuresnamely AP, gRBP, DCG, and ERR-only gRPB is an interval scale when we choose a specific value of the parameter p and define a specific total order among systems while all the other IR measures are not interval scales. Besides the formal framework itself and the proof of the scale properties of several commonly used IR measures, the paper also defines some brand new set-based and rank-based IR evaluation measures which ensure to be interval scales.
引用
收藏
页码:409 / 422
页数:14
相关论文
共 34 条
  • [1] Amigó E, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P643
  • [2] A comparison of extrinsic clustering evaluation metrics based on formal constraints
    Amigo, Enrique
    Gonzalo, Julio
    Artiles, Javier
    Verdejo, Felisa
    [J]. INFORMATION RETRIEVAL, 2009, 12 (04): : 461 - 486
  • [3] [Anonymous], 1979, INFORM RETRIEVAL
  • [4] [Anonymous], 1971, FDN MEASUREMENT ADDI
  • [5] [Anonymous], 1981, The Art of Computer Programming
  • [6] Volume
  • [7] [Anonymous], 2006, P 15 ACM INT C INF K, DOI DOI 10.1145/1183614.1183630
  • [8] Bollmann P., 1984, Research and Development in Information Retrieval. Proceedings of the Joint BCS and ACM Symposium, P233
  • [9] Bollmann Peter, 1980, P 3 ANN ACM C RES DE, P256
  • [10] Busin L., 2013, P 2013 C THEOR INF R, P22