Online Learning in Limit Order Book Trade Execution

被引:8
作者
Akbarzadeh, Nima [1 ,2 ]
Tekin, Cem [2 ]
van der Schaar, Mihaela [3 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 0E9, Canada
[2] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Ankara, Turkey
[3] Oxford Man Inst Quantitat Finance, Oxford OX2 6ED, England
基金
美国国家科学基金会;
关键词
Limit order book; Markov decision process; online learning; dynamic programming; bounded regret; MARKOV DECISION-PROCESSES;
D O I
10.1109/TSP.2018.2858188
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose an online learning algorithm for optimal execution in the limit order book of a financial asset. Given a certain number of shares to sell and an allocated time window to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell via market orders at prespecified time slots within the allocated time interval. We model this problem as a Markov Decision Process (MDP), which is then solved by dynamic programming. First, we prove that the optimal policy has a specific form, which requires either selling no shares or the maximum allowed amount of shares at each time slot. Then, we consider the learning problem, in which the state transition probabilities are unknown and need to be learned on the fly. We propose a learning algorithm that exploits the form of the optimal policy when choosing the amount to trade. Interestingly, this algorithm achieves bounded regret with respect to the optimal policy computed based on the complete knowledge of the market dynamics. Our numerical results on several finance datasets show that the proposed algorithm performs significantly better than the traditional Q-learning algorithm by exploiting the structure of the problem.
引用
收藏
页码:4626 / 4641
页数:16
相关论文
共 42 条
[1]   Introduction to the Issue on Financial Signal Processing and Machin Learning for Electronic Trading [J].
Akansu, Ali N. ;
Malioutov, Dmitry ;
Palomar, Daniel P. ;
Jay, Emmanuelle ;
Mandic, Danilo P. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) :979-981
[2]  
Akbarzadeh N, 2017, IEEE GLOB CONF SIG, P898, DOI 10.1109/GlobalSIP.2017.8309090
[3]  
Akbarzadeh N, 2016, ANN ALLERTON CONF, P1236, DOI 10.1109/ALLERTON.2016.7852376
[4]  
Almgren R.F., 2003, APPL MATH FINANCE, V10, P1, DOI DOI 10.1080/135048602100056
[5]  
Almgren R.F., 2001, J RISK, V3, P5, DOI DOI 10.21314/JOR.2001.041
[6]  
[Anonymous], 2009, Advances in Neural Information Processing Systems
[7]  
[Anonymous], 2012, Dynamic programming and optimal control
[8]  
[Anonymous], 2008, Advances in Neural Information Processing Systems
[9]  
[Anonymous], 2007, Advances in Neural Information Processing Systems
[10]  
[Anonymous], P 23 INT C MACH LEAR