A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

被引：0

作者：

Afshar, Reza Refaei ^{[1
]}

Rhuggenaath, Jason ^{[1
]}

Zhang, Yingqian ^{[1
]}

Kaymak, Uzay ^{[1
]}

机构：

[1] Eindhoven Univ Technol, Eindhoven, Netherlands

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

Real Time Bidding; Reinforcement Learning; Reward Shaping; Deep Learning;

D O I：

10.1109/IJCNN52387.2021.9533817

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines.

引用

页数：8

共 50 条

[1] Hindsight Reward Shaping in Deep Reinforcement Learning
de Villiers, Byron
Sabatta, Deon
2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
[2] An Improvement on Mapless Navigation with Deep Reinforcement Learning: A Reward Shaping Approach
Alipanah, Arezoo
Moosavian, S. Ali A.
2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 261 - 266
[3] Reward Shaping in Episodic Reinforcement Learning
Grzes, Marek
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
[4] Reward Shaping Based Federated Reinforcement Learning
Hu, Yiqiu
Hua, Yun
Liu, Wenyan
Zhu, Jun
IEEE ACCESS, 2021, 9 : 67259 - 67267
[5] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
Yang, Yulong
Cao, Weihua
Guo, Linwei
Gan, Chao
Wu, Min
2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
[6] Plan-based Reward Shaping for Reinforcement Learning
Grzes, Marek
Kudenko, Daniel
2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
[7] Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management
De Moor, Bram J.
Gijsbrechts, Joren
Boute, Robert N.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 301 (02) : 535 - 545
[8] Online VNF Placement using Deep Reinforcement Learning and Reward Constrained Policy Optimization
Mohamed, Ramy
Avgeris, Marios
Leivadeas, Aris
Lambadaris, Ioannis
2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 269 - 274
[9] Optimizing Reinforcement Learning Agents in Games Using Curriculum Learning and Reward Shaping
Khan, Adil
Muhammad, Muhammad
Naeem, Muhammad
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2025, 36 (01)
[10] Bi-level Optimization Method for Automatic Reward Shaping of Reinforcement Learning
Wang, Ludi
Wang, Zhaolei
Gong, Qinghai
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 382 - 393

← 1 2 3 4 5 →