Who Should Be Given Incentives? Counterfactual Optimal Treatment Regimes Learning for Recommendation

被引:3
|
作者
Li, Haoxuan [1 ]
Zheng, Chunyuan [2 ]
Wu, Peng [3 ]
Kuang, Kun [4 ]
Liu, Yue [5 ]
Cui, Peng [6 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Univ Calif San Diego, La Jolla, CA 92093 USA
[3] Beijing Technol & Business Univ, Beijing, Peoples R China
[4] Zhejiang Univ, Hangzhou, Peoples R China
[5] Renmin Univ China, Beijing, Peoples R China
[6] Tsinghua Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Counterfactual; Optimal treatment regime; Recommender system; CAUSAL INFERENCE; REGRESSION;
D O I
10.1145/3580305.3599550
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective personalized incentives can improve user experience and increase platform revenue, resulting in a win-win situation between users and e-commerce companies. Previous studies have used uplift modeling methods to estimate the conditional average treatment effects of users' incentives, and then placed the incentives by maximizing the sum of estimated treatment effects under a limited budget. However, some users will always buy whether incentives are given or not, and they will actively collect and use incentives if provided, named "Always Buyers". Identifying and predicting these "Always Buyers" and reducing incentive delivery to them can lead to a more rational incentive allocation. In this paper, we first divide users into five strata from an individual counterfactual perspective, and reveal the failure of previous uplift modeling methods to identify and predict the "Always Buyers". Then, we propose principled counterfactual identification and estimation methods and prove their unbiasedness. We further propose a counterfactual entire-space multi-task learning approach to accurately perform personalized incentive policy learning with a limited budget. We also theoretically derive a lower bound on the reward of the learned policy. Extensive experiments are conducted on three real-world datasets with two common incentive scenarios, and the results demonstrate the effectiveness of the proposed approaches.
引用
收藏
页码:1235 / 1247
页数:13
相关论文
共 8 条
  • [1] Optimal Treatment Regimes for Proximal Causal Learning
    Shen, Tao
    Cui, Yifan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Concordance-assisted learning for estimating optimal individualized treatment regimes
    Fan, Caiyun
    Lu, Wenbin
    Song, Rui
    Zhou, Yong
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2017, 79 (05) : 1565 - 1582
  • [3] Learning optimal dynamic treatment regimes from longitudinal data
    Williams, Nicholas T.
    Hoffman, Katherine L.
    Diaz, Ivan
    Rudolph, Kara E.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2024, 193 (12) : 1768 - 1775
  • [4] Penalized robust learning for optimal treatment regimes with heterogeneous individualized treatment effects
    Li, Canhui
    Li, Weirong
    Zhu, Wensheng
    JOURNAL OF APPLIED STATISTICS, 2024, 51 (06) : 1151 - 1170
  • [5] Q- and A-Learning Methods for Estimating Optimal Dynamic Treatment Regimes
    Schulte, Phillip J.
    Tsiatis, Anastasios A.
    Laber, Eric B.
    Davidian, Marie
    STATISTICAL SCIENCE, 2014, 29 (04) : 640 - 661
  • [6] A robust covariate-balancing method for learning optimal individualized treatment regimes
    Li, Canhui
    Zeng, Donglin
    Zhu, Wensheng
    BIOMETRIKA, 2024, 112 (01)
  • [7] Pool adjacent violators algorithm-assisted learning with application on estimating optimal individualized treatment regimes
    Chen, Baojiang
    Yuan, Ao
    Qin, Jing
    BIOMETRICS, 2022, 78 (04) : 1475 - 1488
  • [8] Optimal treatment regimes for competing risk data using doubly robust outcome weighted learning with bi-level variable selection
    He, Yizeng
    Kim, Soyoung
    Kim, Mi-Ok
    Saber, Wael
    Ahn, Kwang Woo
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 158