A deep Q-learning approach to optimize ordering and dynamic pricing decisions in the presence of strategic customers

被引：6

作者：

Alamdar, Parisa Famil ^{[1
]}

Seifi, Abbas ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Tehran Polytech, Dept Ind Engn & Management Syst, Tehran, Iran

来源：

INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS | 2024年 / 269卷

关键词：

Deep reinforcement learning; Dynamic pricing; Strategic customer; Neural network demand model; Multiple substitute products; INVENTORY; MODELS; CHOICE;

D O I：

10.1016/j.ijpe.2024.109154

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

In this paper, we present an optimization method to analyze the simultaneous decisions on dynamic pricing and ordering quantities for seasonal products, by a retailer in monopolistic condition. Customers are assumed to be strategic and may postpone their purchase to get a lower price in future. The problem has been investigated in the context of multiple substitute products. We have developed a model based on deep neural networks to estimate customers' demand. The problem is complex and cannot be solved using classical optimization methods. Therefore, we have developed a reinforcement learning algorithm called deep Q -learning algorithm (DQL) to solve the problem. The proposed algorithm is a combination of a Q -learning algorithm and two deep neural networks for the primary and discount sales periods, which uses the neural network to estimate the Q -values in a large space of states and actions. The performances of the demand model and the proposed optimization algorithm have been tested using a real -world dataset taken from the clothing industry. The results of our experiments demonstrate that the proposed demand model performs better than a fully connected neural networkbased model and a latent class model tested in this paper. Furthermore, the performance of the DQL algorithm is significantly superior to those of two simulated annealing and genetic algorithms. In addition, the results of a comparison between the DQL algorithm and another reinforcement learning algorithm called State -ActionReward -State -Action (SARSA) indicate that the proposed algorithm results in higher revenues and takes less time to converge. Consequently, the proposed algorithm has a high potential for solving such a large scale integrated pricing and ordering optimization problem.

引用

页数：17

共 39 条

[1] Alpaydin E., 2020, Introduction to Machine Learning, VFourth
[2] Joint Inventory and Pricing Decisions for an Assortment
Aydin, Goker
Porteus, Evan L.
[J]. OPERATIONS RESEARCH, 2008, 56 (05) : 1247 - 1255
[3] Free Riding in a Multi-Channel Supply Chain
Bernstein, Fernando
Song, Jing-Sheng
Zheng, Xiaona
[J]. NAVAL RESEARCH LOGISTICS, 2009, 56 (08) : 745 - 765
[4] Bishop C. M., 1996, NEURAL NETWORKS PATT
[5] Deep Reinforcement Learning for Dynamic Pricing of Perishable Products
Burman, Vibhati
Vashishtha, Rajesh Kumar
Kumar, Rajan
Ramanan, Sharadha
[J]. OPTIMIZATION AND LEARNING, OLA 2021, 2021, 1443 : 132 - 143
[6] Recent Developments in Dynamic Pricing Research: Multiple Products, Competition, and Limited Demand Information
Chen, Ming
Chen, Zhi-Long
[J]. PRODUCTION AND OPERATIONS MANAGEMENT, 2015, 24 (05) : 704 - 731
[7] Dynamic Pricing Strategies for Perishable Product in a Competitive Multi-Agent Retailers Market
Chen, Wenchong
Liu, Hongwei
Xu, Dan
[J]. JASSS-THE JOURNAL OF ARTIFICIAL SOCIETIES AND SOCIAL SIMULATION, 2018, 21 (02):
[8] Learning dynamic prices in multiseller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments
Chinthalapati, VLR
Yadati, N
Karumanchi, R
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2006, 36 (01): : 92 - 106
[9] Darken C., 1992, Neural Networks for Signal Processing II. Proceedings of the IEEE-SP Workshop (Cat. No.92TH0430-9), P3, DOI 10.1109/NNSP.1992.253713
[10] Den Boer Arnoud V, 2015, Surveys in operations research and management science, V20, P1

← 1 2 3 4 →