Who Should Be Given Incentives? Counterfactual Optimal Treatment Regimes Learning for Recommendation

被引：3

作者：

Li, Haoxuan ^{[1
]}

Zheng, Chunyuan ^{[2
]}

Wu, Peng ^{[3
]}

Kuang, Kun ^{[4
]}

Liu, Yue ^{[5
]}

Cui, Peng ^{[6
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Univ Calif San Diego, La Jolla, CA 92093 USA

[3] Beijing Technol & Business Univ, Beijing, Peoples R China

[4] Zhejiang Univ, Hangzhou, Peoples R China

[5] Renmin Univ China, Beijing, Peoples R China

[6] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Counterfactual; Optimal treatment regime; Recommender system; CAUSAL INFERENCE; REGRESSION;

D O I：

10.1145/3580305.3599550

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Effective personalized incentives can improve user experience and increase platform revenue, resulting in a win-win situation between users and e-commerce companies. Previous studies have used uplift modeling methods to estimate the conditional average treatment effects of users' incentives, and then placed the incentives by maximizing the sum of estimated treatment effects under a limited budget. However, some users will always buy whether incentives are given or not, and they will actively collect and use incentives if provided, named "Always Buyers". Identifying and predicting these "Always Buyers" and reducing incentive delivery to them can lead to a more rational incentive allocation. In this paper, we first divide users into five strata from an individual counterfactual perspective, and reveal the failure of previous uplift modeling methods to identify and predict the "Always Buyers". Then, we propose principled counterfactual identification and estimation methods and prove their unbiasedness. We further propose a counterfactual entire-space multi-task learning approach to accurately perform personalized incentive policy learning with a limited budget. We also theoretically derive a lower bound on the reward of the learned policy. Extensive experiments are conducted on three real-world datasets with two common incentive scenarios, and the results demonstrate the effectiveness of the proposed approaches.

引用

页码：1235 / 1247

页数：13

共 8 条

[1] Optimal Treatment Regimes for Proximal Causal Learning
Shen, Tao
Cui, Yifan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Concordance-assisted learning for estimating optimal individualized treatment regimes
Fan, Caiyun
Lu, Wenbin
Song, Rui
Zhou, Yong
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2017, 79 (05) : 1565 - 1582
[3] Learning optimal dynamic treatment regimes from longitudinal data
Williams, Nicholas T.
Hoffman, Katherine L.
Diaz, Ivan
Rudolph, Kara E.
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2024, 193 (12) : 1768 - 1775
[4] Penalized robust learning for optimal treatment regimes with heterogeneous individualized treatment effects
Li, Canhui
Li, Weirong
Zhu, Wensheng
JOURNAL OF APPLIED STATISTICS, 2024, 51 (06) : 1151 - 1170
[5] Q- and A-Learning Methods for Estimating Optimal Dynamic Treatment Regimes
Schulte, Phillip J.
Tsiatis, Anastasios A.
Laber, Eric B.
Davidian, Marie
STATISTICAL SCIENCE, 2014, 29 (04) : 640 - 661
[6] A robust covariate-balancing method for learning optimal individualized treatment regimes
Li, Canhui
Zeng, Donglin
Zhu, Wensheng
BIOMETRIKA, 2024, 112 (01)
[7] Pool adjacent violators algorithm-assisted learning with application on estimating optimal individualized treatment regimes
Chen, Baojiang
Yuan, Ao
Qin, Jing
BIOMETRICS, 2022, 78 (04) : 1475 - 1488
[8] Optimal treatment regimes for competing risk data using doubly robust outcome weighted learning with bi-level variable selection
He, Yizeng
Kim, Soyoung
Kim, Mi-Ok
Saber, Wael
Ahn, Kwang Woo
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 158

← 1 →