Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

被引:0
作者
Wang, Haozhe [1 ]
Du, Chao [1 ]
Pang, Panyan [1 ]
He, Li [1 ]
Wang, Liang [1 ]
Zheng, Bo [1 ]
机构
[1] Alibaba Grp, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
关键词
Constrained Bidding; Reinforcement Learning; Causality; AUCTION;
D O I
10.1145/3580305.3599254
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of the Internet has led to the emergence of online advertising, driven by the mechanics of online auctions. In these repeated auctions, software agents participate on behalf of aggregated advertisers to optimize for their long-term utility. To fulfill the diverse demands, bidding strategies are employed to optimize advertising objectives subject to different spending constraints. Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions, which contradicts the adversarial nature of online ad markets where different parties possess potentially conflicting objectives. In this regard, we explore the problem of constrained bidding in adversarial bidding environments, which assumes no knowledge about the adversarial factors. Instead of relying on the i.i.d. assumption, our insight is to align the train distribution of environments with the potential test distribution meanwhile minimizing policy regret. Based on this insight, we propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments. In addition, we pioneer to incorporate expert demonstrations for learning bidding strategies. Through a causality-aware policy design, we improve upon MiRO by distilling knowledge from the experts. Extensive experiments on both industrial data and synthetic data show that our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
引用
收藏
页码:2314 / 2325
页数:12
相关论文
共 50 条
  • [41] Deep reinforcement learning-based framework for constrained any-objective optimization
    Honari H.
    Khodaygan S.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9575 - 9591
  • [42] Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning
    Tian, Chang
    Liu, An
    Huang, Guan
    Luo, Wu
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 1609 - 1624
  • [43] Enhancing Cuckoo Search Algorithm by using Reinforcement Learning for Constrained Engineering Optimization Problems
    Shehab, Mohammad
    Khader, Ahamad Tajudin
    Alia, Mohammad A.
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 812 - 816
  • [44] Online VNF Placement using Deep Reinforcement Learning and Reward Constrained Policy Optimization
    Mohamed, Ramy
    Avgeris, Marios
    Leivadeas, Aris
    Lambadaris, Ioannis
    2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 269 - 274
  • [45] A hierarchical constrained reinforcement learning for optimization of bitumen recovery rate in a primary separation vessel
    Shafi, Hareem
    Velswamy, Kirubakaran
    Ibrahim, Fadi
    Huang, Biao
    COMPUTERS & CHEMICAL ENGINEERING, 2020, 140
  • [46] Human-Aware Robot Navigation via Reinforcement Learning with Hindsight Experience Replay and Curriculum Learning
    Li, Keyu
    Lu, Ye
    Meng, Max Q. -H.
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 346 - 351
  • [47] QoS-Aware Bidding Strategies for VM Spot Instances: A Reinforcement Learning Approach Applied to Periodic Long Running Jobs
    Abundo, Marco
    Di Valerio, Valerio
    Cardellini, Valeria
    Lo Presti, Francesco
    PROCEEDINGS OF THE 2015 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM), 2015, : 53 - 61
  • [48] Automatic Report Generation for Chest X-Ray Images via Adversarial Reinforcement Learning
    Hou, Daibing
    Zhao, Zijian
    Liu, Yuying
    Chang, Faliang
    Hu, Sanyuan
    IEEE ACCESS, 2021, 9 : 21236 - 21250
  • [49] Exoatmospheric Evasion Guidance Law with Total Energy Limit via Constrained Reinforcement Learning
    Yan, Mengda
    Yang, Rennong
    Zhao, Yu
    Yue, Longfei
    Zhao, Xiaoru
    INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2024, 25 (04) : 1361 - 1379
  • [50] Adversarial Attacks on Graph Neural Networks via Node Injections: A Hierarchical Reinforcement Learning Approach
    Sun, Yiwei
    Wang, Suhang
    Tang, Xianfeng
    Hsieh, Tsung-Yu
    Honavar, Vasant
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 673 - 683