Large-Scale Bandit Approaches for Recommender Systems

被引:21
作者
Zhou, Qian [1 ]
Zhang, XiaoFang [1 ,2 ]
Xu, Jin [1 ]
Liang, Bin [1 ]
机构
[1] Soochow Univ, Dept Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210033, Peoples R China
来源
NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I | 2017年 / 10634卷
关键词
Recommender systems; Multi-Armed Bandit; Context-free;
D O I
10.1007/978-3-319-70087-8_83
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recommender systems have been successfully applied to many application areas to predict users' preference. However, these systems face the exploration-exploitation dilemma when making a recommendation, since they need to exploit items which raise users' interest and explore new items to improve satisfaction simultaneously. In this paper, we deal with this dilemma through Multi-Armed Bandit (MAB) approaches, especially for large-scale recommender systems that have vast or infinite items. We propose two large-scale bandit approaches under the situations that there is no available priori information. The continuous exploration in our approaches can address the cold start problem in recommender systems. Furthermore, our context-free approaches are based on users' click behavior without the dependence on priori information. We theoretically prove that our approaches can converge to optimal item recommendations in the long run. Experimental results indicate that our approaches are able to provide more accurate recommendations than some classic bandit approaches in terms of click-through rates, with less calculation time.
引用
收藏
页码:811 / 821
页数:11
相关论文
共 18 条
[1]   Incorporating contextual information in recommender systems using a multidimensional approach [J].
Adomavicius, G ;
Sankaranarayanan, R ;
Sen, S ;
Tuzhilin, A .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2005, 23 (01) :103-145
[2]   Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions [J].
Adomavicius, G ;
Tuzhilin, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (06) :734-749
[3]  
[Anonymous], 2015, Recommender Systems Handbook, DOI [10.1007/978-1-4899-7637-6\\ 6, DOI 10.1007/978-1-4899-7637-6, 10.1007/978-1-4899-7637-66, DOI 10.1145/1454008.1454068]
[4]   Linear Bayes policy for learning in contextual-bandits [J].
Antonio Martin H, Jose ;
Vargas, Ana M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) :7400-7406
[5]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[6]   Fab: Content-based, collaborative recommendation [J].
Balabanovic, M ;
Shoham, Y .
COMMUNICATIONS OF THE ACM, 1997, 40 (03) :66-72
[7]  
Bubeck S., 2012, C LEARN THEOR, P42
[8]   Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J].
Bubeck, Sebastien ;
Cesa-Bianchi, Nicolo .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01) :1-122
[9]  
Cesa-Bianchi N., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P100
[10]  
Cesa-Bianchi N., 2006, PREDICTION LEARNING