Maximum Entropy Policy for Long-Term Fairness in Interactive Recommender Systems

被引:4
作者
Shi, Xiaoyu [1 ]
Liu, Quanliang [1 ,2 ]
Xie, Hong [1 ]
Bai, Yanan [3 ]
Shang, Mingsheng [1 ]
机构
[1] Chinese Acad Sci, Chongqing Inst Green & Intelligent Technol, Chongqing Key Lab Big Data & Intelligent Comp, Chongqing 400714, Peoples R China
[2] Univ Chinese Acad Sci, Chongqing Sch, Chongqing 400714, Peoples R China
[3] Chongqing Univ Technol, Sch Artificial Intelligence, Chongqing 401135, Peoples R China
基金
中国国家自然科学基金;
关键词
Entropy; Recommender systems; Training; Feedback loop; Training data; Robustness; Real-time systems; Long-term fairness; maximum entropy policy; popularity bias; recommender system; reinforcement learning; web services;
D O I
10.1109/TSC.2024.3349636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article considers the problem of maintaining the long-term fairness of item exposure in interactive recommender systems under the dynamic setting that user preference and item popularity evolve over time. The challenge is that the evolving dynamics of user preference and item popularity in the feedback loop amplify the long-term "unfairness" of item exposure. To address this challenge, we first formulate a constrained Markov Decision Process (MDP) to capture the evolving dynamics of user preference. The proposed constrained MDP imposes long-term fairness requirements via maximum entropy techniques. Moreover, to illuminate the "unfairness" amplifying effect caused by the evolving dynamic of item popularity in the feedback loop, we design a debiased reward function to eliminate popularity bias in the training data. To this end, the proposed framework can maintain acceptable recommendation accuracy while exposing items as randomly as possible, ensuring long-term benefits for users. To address the data sparsity issue, the proposed framework can easily integrate self-supervised learning methods to enhance state representation. Experiments on three datasets and an authentic Reinforcement Learning environment (Virtual-Taobao) demonstrate the effectiveness and superiority of the proposed framework in terms of recommendation accuracy and fairness, and show the robustness against data sparsity and noise.
引用
收藏
页码:1029 / 1043
页数:15
相关论文
共 44 条
[1]   Controlling Popularity Bias in Learning-to-Rank Recommendation [J].
Abdollahpouri, Himan ;
Burke, Robin ;
Mobasher, Bamshad .
PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), 2017, :42-46
[2]  
Abdollahpouri Himan, 2019, 32 INT FLAIRS C, P413
[3]   Fairness in Recommendation Ranking through Pairwise Comparisons [J].
Beutel, Alex ;
Chen, Jilin ;
Doshi, Tulsee ;
Qian, Hai ;
Wei, Li ;
Wu, Yi ;
Heldt, Lukasz ;
Zhao, Zhe ;
Hong, Lichan ;
Chi, Ed H. ;
Goodrow, Cristos .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2212-2220
[4]   Building Classifiers with Independency Constraints [J].
Calders, Toon ;
Kamiran, Faisal ;
Pechenizkiy, Mykola .
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, :13-18
[5]   Sequential Recommendation with Graph Neural Networks [J].
Chang, Jianxin ;
Gao, Chen ;
Zheng, Yu ;
Hui, Yiqun ;
Niu, Yanan ;
Song, Yang ;
Jin, Depeng ;
Li, Yong .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :378-387
[6]  
Chen JW, 2021, Arxiv, DOI arXiv:2010.03240
[7]  
Chen Ting, 2019, INT C MACH LEARN, DOI DOI 10.22489/CINC.2017.065-469
[8]   Try This Instead: Personalized and Interpretable Substitute Recommendation [J].
Chen, Tong ;
Yin, Hongzhi ;
Ye, Guanhua ;
Huang, Zi ;
Wang, Yang ;
Wang, Meng .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :891-900
[9]   Recommendation Systems: Algorithms, Challenges, Metrics, and Business Opportunities [J].
Fayyaz, Zeshan ;
Ebrahimian, Mahsa ;
Nawara, Dina ;
Ibrahim, Ahmed ;
Kashef, Rasha .
APPLIED SCIENCES-BASEL, 2020, 10 (21) :1-20
[10]  
Fujimoto S, 2018, PR MACH LEARN RES, V80