Q-ADER: An Effective Q-Learning for Recommendation With Diminishing Action Space

被引：0

作者：

Li, Fan ^{[1
]}

Qu, Hong ^{[2
]}

Zhang, Liyan ^{[3
]}

Fu, Mingsheng ^{[2
]}

Chen, Wenyu ^{[2
]}

Yi, Zhang ^{[4
]}

机构：

[1] Sichuan Univ, West China Univ Hosp 2, Dept Obstet & Gynecol, Key Lab Birth Defects & Related Dis Women & Childr, Chengdu 610041, Peoples R China

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[3] State Grid Sichuan Elect Power Co, Mkt Serv Ctr, Chengdu 610041, Peoples R China

[4] Sichuan Univ, Sch Comp Sci, Chengdu 610065, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 05期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Recommender systems; Standards; Q-learning; Training; Temporal difference learning; Reliability; Prediction algorithms; Error reduction; recommender systems; value estimate error;

D O I：

10.1109/TNNLS.2024.3424254

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (RL) has been widely applied to personalized recommender systems (PRSs) as they can capture user preferences progressively. Among RL-based techniques, deep Q-network (DQN) stands out as the most popular choice due to its simple update strategy and superior performance. Typically, many recommendation scenarios are accompanied by the diminishing action space setting, where the available action space will gradually decrease to avoid recommending duplicate items. However, existing DQN-based recommender systems inherently grapple with a discrepancy between the fixed full action space inherent in the Q-network and the diminishing available action space during recommendation. This article elucidates how this discrepancy induces an issue termed action diminishing error in the vanilla temporal difference (TD) operator. Due to this discrepancy, standard DQN methods prove impractical for learning accurate value estimates, rendering them ineffective in the context of diminishing action space. To mitigate this issue, we propose the Q-learning-based action diminishing error reduction (Q-ADER) algorithm to modify the value estimate error at each step. In practice, Q-ADER augments the standard TD learning with an error reduction term which is straightforward to implement on top of the existing DQN algorithms. Experiments are conducted on four real-world datasets to verify the effectiveness of our proposed algorithm.

引用

页码：8510 / 8524

页数：15

共 49 条

[1] Reinforcement Learning based Recommender Systems: A Survey [J].

Afsar, M. Mehdi ;

Crump, Trafford ;

Far, Behrouz .

ACM COMPUTING SURVEYS, 2023, 55 (07)

[2] RDERL: Reliable deep ensemble reinforcement learning-based recommender system [J].

Ahmadian, Milad ;

Ahmadian, Sajad ;

Ahmadi, Mahmood .

KNOWLEDGE-BASED SYSTEMS, 2023, 263

[3]

Anschel Oron, 2017, P MACHINE LEARNING R, V70

[4]

Asadi K, 2017, PR MACH LEARN RES, V70

[5] Rethinking Collaborative Metric Learning: Toward an Efficient Alternative Without Negative Sampling [J].

Bao, Shilong ;

Xu, Qianqian ;

Yang, Zhiyong ;

Cao, Xiaochun ;

Huang, Qingming .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) :1017-1035

[6]

Chen HK, 2019, AAAI CONF ARTIF INTE, P3312

[7] Off-Policy Actor-critic for Recommender Systems [J].

Chen, Minmin ;

Xu, Can ;

Gatto, Vince ;

Jain, Devanshu ;

Kumar, Aviral ;

Chi, Ed .

PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, :338-349

[8] Top-K Off-Policy Correction for a REINFORCE Recommender System [J].

Chen, Minmin ;

Beutel, Alex ;

Covington, Paul ;

Jain, Sagar ;

Belletti, Francois ;

Chi, Ed H. .

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :456-464

[9] A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings [J].

Chen, Shaotao ;

Qiu, Xihe ;

Tan, Xiaoyu ;

Fang, Zhijun ;

Jin, Yaochu .

INFORMATION SCIENCES, 2022, 611 :47-64

[10]

Deffayet R., 2023, ACM SIGIR FORUM, V56, P1

← 1 2 3 4 5 →