Application of Reinforcement Learning in Airline Revenue Management

被引：0

作者：

Sklenar, Jaroslav ^{[1
]}

Borg, David Stephen ^{[1
]}

Popela, Pavel ^{[2
]}

机构：

[1] Univ Malta, Msida 2080, Msd, Malta

[2] Brno Univ Technol, Tech 2896 2, Brno 61669, Czech Republic

来源：

INTERNATIONAL CONFERENCE PDMU-2012: PROBLEMS OF DECISION MAKING UNDER UNCERTAINTIES | 2012年

关键词：

revenue management; stochastic optimal control problem; reinforcement learning; MODEL; ALLOCATION;

D O I：

暂无

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Revenue Management (RM) is nowadays an essential tool used in large industries, especially by airline companies. This tool aims at optimising revenues by a better control of several factors; predominantly inventory and pricing. In the paper we consider a stochastic optimal control problem which consists of finding an optimal policy to when it is profitable (or not) for an airline company to accept a group booking request. The trade-off is well known. Accepting a group request results in blocking the seats that would otherwise be available to normal single passengers by groups requesting a discounted price in the early days a flight is open for booking with a relatively high probability of future drop outs (individual cancellations). On the other hand rejecting most group requests might result in under-booking of the planes. So an optimal policy of when and under what conditions should the group requests be accepted is required. Our attempt to solve this problem is based on the Infinite Horizon Stochastic Dynamic Programming (SDP) approach. More specifically, the decision making problem is described as a Markov Decision Problem (MDP). The states are based on the time till the departure and the number of seats booked at the time. The state definition is of course a simplification of a more complicated real booking situation. Nonetheless, this resulted in almost 3 million different states that make the classical solution based on transition and reward matrices intractable. To overcome the curse of dimensionality and especially the curse of modelling, we have applied the solution technique called Reinforcement Learning (RL). In the SDP context this technique is based on simulation of rewards associated with the decisions. A policy close to optimum could thus be found in an acceptable time without the need of transition probabilities and rewards.

引用

页码：171 / 180

页数：10

共 12 条

[1]

[Anonymous], 1989, (Ph.D. thesis

[2]

[Anonymous], 2003, Simulation-Based Optimization: Parametric Optimization Tech- niques Reinforcement Learning

[3] APPLICATION OF A PROBABILISTIC DECISION-MODEL TO AIRLINE SEAT INVENTORY CONTROL [J].