Towards reproducibility in recommender-systems research

被引：40

作者：

Beel, Joeran ^{[1
,5
]}

Breitinger, Corinna ^{[1
,2
]}

Langer, Stefan ^{[1
,3
]}

Lommatzsch, Andreas ^{[4
]}

Gipp, Bela ^{[1
,5
]}

机构：

[1] Docear, Constance, Germany

[2] Linnaeus Univ, Sch Comp Sci Phys & Math, S-35195 Vaxjo, Sweden

[3] Otto Von Guericke Univ, Dept Comp Sci, D-39106 Magdeburg, Germany

[4] Tech Univ Berlin, DAI Lab, Ernst Reuter Pl 7, D-10587 Berlin, Germany

[5] Univ Konstanz, Dept Informat Sci, Universitatsstr 10, D-78464 Constance, Germany

来源：

USER MODELING AND USER-ADAPTED INTERACTION | 2016年 / 26卷 / 01期

关键词：

Recommender systems; Evaluation; Experimentation; Reproducibility;

D O I：

10.1007/s11257-016-9174-x

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Numerous recommendation approaches are in use today. However, comparing their effectiveness is a challenging task because evaluation results are rarely reproducible. In this article, we examine the challenge of reproducibility in recommender-system research. We conduct experiments using Plista's news recommender system, and Docear's research-paper recommender system. The experiments show that there are large discrepancies in the effectiveness of identical recommendation approaches in only slightly different scenarios, as well as large discrepancies for slightly different approaches in identical scenarios. For example, in one news-recommendation scenario, the performance of a content-based filtering approach was twice as high as the second-best approach, while in another scenario the same content-based filtering approach was the worst performing approach. We found several determinants that may contribute to the large discrepancies observed in recommendation effectiveness. Determinants we examined include user characteristics (gender and age), datasets, weighting schemes, the time at which recommendations were shown, and user-model size. Some of the determinants have interdependencies. For instance, the optimal size of an algorithms' user model depended on users' age. Since minor variations in approaches and scenarios can lead to significant changes in a recommendation approach's performance, ensuring reproducibility of experimental results is difficult. We discuss these findings and conclude that to ensure reproducibility, the recommender-system community needs to (1) survey other research fields and learn from them, (2) find a common understanding of reproducibility, (3) identify and understand the determinants that affect reproducibility, (4) conduct more comprehensive experiments, (5) modernize publication practices, (6) foster the development and use of recommendation frameworks, and (7) establish best-practice guidelines for recommender-systems research.

引用

页码：69 / 101

页数：33

共 100 条

[1] Estimating the reproducibility of psychological science
Aarts, Alexander A.
Anderson, Joanna E.
Anderson, Christopher J.
Attridge, Peter R.
Attwood, Angela
Axt, Jordan
Babel, Molly
Bahnik, Stepan
Baranski, Erica
Barnett-Cowan, Michael
Bartmess, Elizabeth
Beer, Jennifer
Bell, Raoul
Bentley, Heather
Beyan, Leah
Binion, Grace
Borsboom, Denny
Bosch, Annick
Bosco, Frank A.
Bowman, Sara D.
Brandt, Mark J.
Braswell, Erin
Brohmer, Hilmar
Brown, Benjamin T.
Brown, Kristina
Bruening, Jovita
Calhoun-Sauls, Ann
Callahan, Shannon P.
Chagnon, Elizabeth
Chandler, Jesse
Chartier, Christopher R.
Cheung, Felix
Christopherson, Cody D.
Cillessen, Linda
Clay, Russ
Cleary, Hayley
Cloud, Mark D.
Cohn, Michael
Cohoon, Johanna
Columbus, Simon
Cordes, Andreas
Costantini, Giulio
Alvarez, Leslie D. Cramblet
Cremata, Ed
Crusius, Jan
DeCoster, Jamie
DeGaetano, Michelle A.
Della Penna, Nicolas
den Bezemer, Bobby
Deserno, Marie K.
[J]. SCIENCE, 2015, 349 (6251)
[2] Al-Maskari Azzah, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P773, DOI 10.1145/1277741.1277902
[3] Amatriain X, 2009, LECT NOTES COMPUT SC, V5535, P247, DOI 10.1007/978-3-642-02247-0_24
[4] [Anonymous], 1979, Cognitive Science
[5] [Anonymous], 2013, Multimedia services in intelligent environments
[6] [Anonymous], 2012, P 21 ACM INT C INF K, DOI DOI 10.1145/2396761.2398542
[7] [Anonymous], 2002, P 2002 ACM C COMPUTE, DOI DOI 10.1145/587078.587096
[8] [Anonymous], 2014, SIGIR FORUM
[9] [Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772734
[10] [Anonymous], 2010, P 19 ACM INT C INFOR, DOI DOI 10.1145/1871437.1871517

← 1 2 3 4 5 6 7 8 9 10 →