Scalable Deep Q-Learning for Session-Based Slate Recommendation

被引：0

作者：

Roy, Aayush Singha ^{[1
,2
]}

D'Amico, Edoardo ^{[1
,2
]}

Tragos, Elias ^{[1
,2
]}

Lawlor, Aonghus ^{[1
,2
]}

Hurley, Neil ^{[1
,2
]}

机构：

[1] Univ Coll Dublin, Dublin, Ireland

[2] Insight Ctr Data Analyt, Dublin, Ireland

来源：

PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023 | 2023年

基金：

爱尔兰科学基金会;

关键词：

Recommender systems; Slate recommendation; Reinforcement learning;

D O I：

10.1145/3604915.3608843

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) has demonstrated great potential to improve slate-based recommender systems by optimizing recommendations for long-term user engagement. To handle the combinatorial action space in slate recommendation, recent works decompose the Q-value of a slate into item-wise Q-values, using an item-wise value-based policy. However, the common case where the value function is a parameterized function taking state and action as input results in a linearly increasing number of evaluations required to select an action, proportional to the number of candidate items. While slow training may be acceptable, this becomes intractable when considering the costly evaluation of the parameterized function, such as with deep neural networks, during model serving time. To address this issue, we propose an actor-based policy that reduces the evaluation of the Q-function to a subset of items, significantly reducing inference time and enabling practical deployment in real-world industrial settings. In our empirical evaluation, we demonstrate that our proposed approach achieves equivalent user session engagement to a value-based policy, while significantly reducing the slate serving time by at least 4 times.

引用

页码：877 / 882

页数：6

共 50 条

[21] Active deep Q-learning with demonstration [J].

Si-An Chen ;

Voot Tangkaratt ;

Hsuan-Tien Lin ;

Masashi Sugiyama .

Machine Learning, 2020, 109 :1699-1725

[22] Active deep Q-learning with demonstration [J].

Chen, Si-An ;

Tangkaratt, Voot ;

Lin, Hsuan-Tien ;

Sugiyama, Masashi .

MACHINE LEARNING, 2020, 109 (9-10) :1699-1725

[23] An Online Home Energy Management System using Q-Learning and Deep Q-Learning [J].

Izmitligil, Hasan ;

Karamancioglu, Abdurrahman .

SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 43

[24] Deep Q-Learning with Prioritized Sampling [J].

Zhai, Jianwei ;

Liu, Quan ;

Zhang, Zongzhang ;

Zhong, Shan ;

Zhu, Haijun ;

Zhang, Peng ;

Sun, Cijia .

NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 :13-22

[25] Entropy-Based Prioritized Sampling in Deep Q-Learning [J].

Ramicic, Mirza ;

Bonarini, Andrea .

2017 2ND INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC 2017), 2017, :1068-1072

[26] Multi-objective route recommendation method based on Q-learning algorithm [J].

Yu, Qingying ;

Xiao, Zhenxing ;

Yang, Feng ;

Gong, Shan ;

Shi, Gege ;

Chen, Chuanming .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (04) :7009-7025

[27] Dynamic session-based music recommendation using information retrieval techniques [J].

Tofani, Arthur ;

Borges, Rodrigo ;

Queiroz, Marcelo .

USER MODELING AND USER-ADAPTED INTERACTION, 2022, 32 (04) :575-609

[28] Autoregressive Decoder With Extracted Gap Sessions for Sequential/Session-Based Recommendation [J].

Chung, Jaewon ;

Lee, Jung Hwa ;

Jang, Beakcheol .

IEEE ACCESS, 2023, 11 :75215-75224

[29] Position-aware graph neural network for session-based recommendation [J].

Sang, Sheng ;

Yuan, Weihua ;

Li, Wenxuan ;

Yang, Zhaohui ;

Zhang, Zhijun ;

Liu, Nan .

KNOWLEDGE-BASED SYSTEMS, 2023, 262

[30] Intelligent Robot in Unknown Environments: Walk Path Using Q-Learning and Deep Q-Learning [J].

El Wafi, Mouna ;

Youssefi, My Abdelkader ;

Dakir, Rachid ;

Bakir, Mohamed .

AUTOMATION, 2025, 6 (01)

← 1 2 3 4 5 →