Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

被引：0

作者：

Wang, Haozhe ^{[1
]}

Du, Chao ^{[1
]}

Pang, Panyan ^{[1
]}

He, Li ^{[1
]}

Wang, Liang ^{[1
]}

Zheng, Bo ^{[1
]}

机构：

[1] Alibaba Grp, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

关键词：

Constrained Bidding; Reinforcement Learning; Causality; AUCTION;

D O I：

10.1145/3580305.3599254

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The proliferation of the Internet has led to the emergence of online advertising, driven by the mechanics of online auctions. In these repeated auctions, software agents participate on behalf of aggregated advertisers to optimize for their long-term utility. To fulfill the diverse demands, bidding strategies are employed to optimize advertising objectives subject to different spending constraints. Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions, which contradicts the adversarial nature of online ad markets where different parties possess potentially conflicting objectives. In this regard, we explore the problem of constrained bidding in adversarial bidding environments, which assumes no knowledge about the adversarial factors. Instead of relying on the i.i.d. assumption, our insight is to align the train distribution of environments with the potential test distribution meanwhile minimizing policy regret. Based on this insight, we propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments. In addition, we pioneer to incorporate expert demonstrations for learning bidding strategies. Through a causality-aware policy design, we improve upon MiRO by distilling knowledge from the experts. Extensive experiments on both industrial data and synthetic data show that our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.

引用

页码：2314 / 2325

页数：12

共 50 条

[41] Deep reinforcement learning-based framework for constrained any-objective optimization
Honari H.
Khodaygan S.
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9575 - 9591
[42] Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning
Tian, Chang
Liu, An
Huang, Guan
Luo, Wu
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 1609 - 1624
[43] Enhancing Cuckoo Search Algorithm by using Reinforcement Learning for Constrained Engineering Optimization Problems
Shehab, Mohammad
Khader, Ahamad Tajudin
Alia, Mohammad A.
2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 812 - 816
[44] Online VNF Placement using Deep Reinforcement Learning and Reward Constrained Policy Optimization
Mohamed, Ramy
Avgeris, Marios
Leivadeas, Aris
Lambadaris, Ioannis
2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 269 - 274
[45] A hierarchical constrained reinforcement learning for optimization of bitumen recovery rate in a primary separation vessel
Shafi, Hareem
Velswamy, Kirubakaran
Ibrahim, Fadi
Huang, Biao
COMPUTERS & CHEMICAL ENGINEERING, 2020, 140
[46] Human-Aware Robot Navigation via Reinforcement Learning with Hindsight Experience Replay and Curriculum Learning
Li, Keyu
Lu, Ye
Meng, Max Q. -H.
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 346 - 351
[47] QoS-Aware Bidding Strategies for VM Spot Instances: A Reinforcement Learning Approach Applied to Periodic Long Running Jobs
Abundo, Marco
Di Valerio, Valerio
Cardellini, Valeria
Lo Presti, Francesco
PROCEEDINGS OF THE 2015 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM), 2015, : 53 - 61
[48] Automatic Report Generation for Chest X-Ray Images via Adversarial Reinforcement Learning
Hou, Daibing
Zhao, Zijian
Liu, Yuying
Chang, Faliang
Hu, Sanyuan
IEEE ACCESS, 2021, 9 : 21236 - 21250
[49] Exoatmospheric Evasion Guidance Law with Total Energy Limit via Constrained Reinforcement Learning
Yan, Mengda
Yang, Rennong
Zhao, Yu
Yue, Longfei
Zhao, Xiaoru
INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2024, 25 (04) : 1361 - 1379
[50] Adversarial Attacks on Graph Neural Networks via Node Injections: A Hierarchical Reinforcement Learning Approach
Sun, Yiwei
Wang, Suhang
Tang, Xianfeng
Hsieh, Tsung-Yu
Honavar, Vasant
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 673 - 683

← 1 2 3 4 5 →