A simple learning agent interacting with an agent-based market model

被引:4
作者
Dicks, Matthew [1 ]
Paskaramoorthy, Andrew [1 ]
Gebbie, Tim [1 ]
机构
[1] Univ Cape Town, Dept Stat Sci, ZA-7700 Cape Town, South Africa
关键词
Strategic order-splitting; Reinforcement learning; Market simulation; Agent-based model; PRICE-IMPACT;
D O I
10.1016/j.physa.2023.129363
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event-driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order sizes and differently sized state spaces. The resulting impact on the agent-based model and market is considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. The smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents, except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations and increase the micro-price volatility when the trading volumes increase.
引用
收藏
页数:18
相关论文
共 57 条
[51]   More statistical properties of order books and price impact [J].
Potters, M ;
Bouchaud, JP .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2003, 324 (1-2) :133-140
[52]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[53]   Why is equity order flow so persistent? [J].
Toth, Bence ;
Palit, Imon ;
Lillo, Fabrizio ;
Farmer, J. Doyne .
JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2015, 51 :218-239
[54]  
Watkins C. J. C. H, 1989, Learning from delayed rewards
[55]  
WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698
[56]  
Wilcox D, 2014, HIERARCHICAL CAUSALI, DOI DOI 10.2139/SSRN.2544327
[57]  
Winker P., 2007, J ECON INTERACT COOR, V2, P125, DOI DOI 10.1007/S11403-007-0020-4