Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently?

被引:28
作者
Tamm, Yan-Martin [1 ]
Damdinov, Rinchin [1 ]
Vasilev, Alexey [1 ]
机构
[1] Sber AI Lab, Moscow, Russia
来源
15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021) | 2021年
关键词
recommender systems; metrics; offline evaluation;
D O I
10.1145/3460231.3478848
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline evaluation is a popular approach to determine the best algorithm in terms of the chosen quality metric. However, if the chosen metric calculates something unexpected, this miscommunication can lead to poor decisions and wrong conclusions. In this paper, we thoroughly investigate quality metrics used for recommender systems evaluation. We look at the practical aspect of implementations found in modern RecSys libraries and at the theoretical aspect of definitions in academic papers. We find that Precision is the only metric universally understood among papers and libraries, while other metrics may have different interpretations. Metrics implemented in different libraries sometimes have the same name but measure different things, which leads to different results given the same input. When defining metrics in an academic paper, authors sometimes omit explicit formulations or give references that do not contain explanations either. In 47% of cases, we cannot easily know how the metric is defined because the definition is not clear or absent. These findings highlight yet another difficulty in recommender system evaluation and call for a more detailed description of evaluation protocols.
引用
收藏
页码:708 / 713
页数:6
相关论文
共 43 条
[1]   ELLIOT: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation [J].
Anelli, Vito Walter ;
Bellogin, Alejandro ;
Ferrara, Antonio ;
Malitesta, Daniele ;
Merra, Felice Antonio ;
Pomo, Claudio ;
Donini, Francesco Maria ;
Di Noia, Tommaso .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :2405-2414
[2]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[3]   Offline evaluation options for recommender systems [J].
Canamares, Rocio ;
Castells, Pablo ;
Moffat, Alistair .
INFORMATION RETRIEVAL JOURNAL, 2020, 23 (04) :387-410
[4]  
Canamares Rocio, 2020, 14 ACM C REC SYST VI, P259, DOI DOI 10.1145/3383313.3412259
[5]   Debiased Offline Evaluation of Recommender Systems: A Weighted-Sampling Approach [J].
Carraro, Diego ;
Bridge, Derek .
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, :1435-1442
[6]  
Cheng WY, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3329
[7]   A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research [J].
Dacrema, Maurizio Ferrari ;
Boglio, Simone ;
Cremonesi, Paolo ;
Jannach, Dietmar .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2021, 39 (02)
[8]   Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches [J].
Dacrema, Maurizio Ferrari ;
Cremonesi, Paolo ;
Jannach, Dietmar .
RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2019, :101-109
[9]  
Damdinov, SUPPL REP THIS PAP
[10]   Item-based top-N recommendation algorithms [J].
Deshpande, M ;
Karypis, G .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2004, 22 (01) :143-177