A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

被引:0
|
作者
Afshar, Reza Refaei [1 ]
Rhuggenaath, Jason [1 ]
Zhang, Yingqian [1 ]
Kaymak, Uzay [1 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
关键词
Real Time Bidding; Reinforcement Learning; Reward Shaping; Deep Learning;
D O I
10.1109/IJCNN52387.2021.9533817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [2] An Improvement on Mapless Navigation with Deep Reinforcement Learning: A Reward Shaping Approach
    Alipanah, Arezoo
    Moosavian, S. Ali A.
    2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 261 - 266
  • [3] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [4] Reward Shaping Based Federated Reinforcement Learning
    Hu, Yiqiu
    Hua, Yun
    Liu, Wenyan
    Zhu, Jun
    IEEE ACCESS, 2021, 9 : 67259 - 67267
  • [5] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [6] Plan-based Reward Shaping for Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
  • [7] Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management
    De Moor, Bram J.
    Gijsbrechts, Joren
    Boute, Robert N.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 301 (02) : 535 - 545
  • [8] Online VNF Placement using Deep Reinforcement Learning and Reward Constrained Policy Optimization
    Mohamed, Ramy
    Avgeris, Marios
    Leivadeas, Aris
    Lambadaris, Ioannis
    2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 269 - 274
  • [9] Optimizing Reinforcement Learning Agents in Games Using Curriculum Learning and Reward Shaping
    Khan, Adil
    Muhammad, Muhammad
    Naeem, Muhammad
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2025, 36 (01)
  • [10] Bi-level Optimization Method for Automatic Reward Shaping of Reinforcement Learning
    Wang, Ludi
    Wang, Zhaolei
    Gong, Qinghai
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 382 - 393