A Framework for Dataset Benchmarking and Its Application to a New Movie Rating Dataset

被引:18
作者
Dooms, Simon [1 ]
Bellogin, Alejandro [2 ,4 ]
De Pessemier, Toon [3 ]
Martens, Luc [3 ]
机构
[1] Univ Ghent, Wica, G Crommenlaan 8 Box 201, B-9050 Ghent, Belgium
[2] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[3] Univ Ghent, iMinds, Wica, G Crommenlaan 8 Box 201, B-9050 Ghent, Belgium
[4] Univ Autonoma Madrid, Escuela Politecn Super, E-28049 Madrid, Spain
关键词
Algorithms; Experimentation; Human Factors; Benchmark; dataset; evaluation; reproducibility; movietweetings; imdb; twitter; movielens;
D O I
10.1145/2751565
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rating datasets are of paramount importance in recommender systems research. They serve as input for recommendation algorithms, as simulation data, or for evaluation purposes. In the past, public accessible rating datasets were not abundantly available, leaving researchers no choice but to work with old and static datasets like MovieLens and Netflix. More recently, however, emerging trends as social media and smart-phones are found to provide rich data sources which can be turned into valuable research datasets. While dataset availability is growing, a structured way for introducing and comparing new datasets is currently still lacking. In this work, we propose a five-step framework to introduce and benchmark new datasets in the recommender systems domain. We illustrate our framework on a new movie rating dataset-called Movie Tweetings-collected from Twitter. Following our framework, we detail the origin of the dataset, provide basic descriptive statistics, investigate external validity, report the results of a number of reproducible benchmarks, and conclude by discussing some interesting advantages and appropriate research use cases.
引用
收藏
页数:28
相关论文
共 35 条
[1]  
[Anonymous], 2013, 7 INT AAAI C WEBL SO
[2]  
[Anonymous], 2009, P 3 ACM C REC SYST
[3]  
[Anonymous], USER MODELING USER A
[4]  
[Anonymous], 2006, CHI 06 EXTENDED ABST
[5]  
[Anonymous], P REC SYST SOC WEB
[6]  
Bellogin A., 2013, P 7 ACM C RECOMMENDE, P485
[7]  
Bellogin Alejandro, 2012, THESIS U AUTONOMA MA
[8]  
Bellogin Alejandro, 2011, Proceedings of the fifth ACM conference on Recommender systems, P333
[9]  
Bellogin Alejandro, 2013, INT C WEBL SOC MED I
[10]  
Bennett J., 2007, SIGKDD Explor. Newsl., V9, P51