Model-predictive control and reinforcement learning in multi-energy system case studies

被引:73
作者
Ceusters, Glenn [1 ,2 ,3 ]
Rodriguez, Roman Cantu [4 ,6 ]
Garcia, Alberte Bouso [4 ]
Franke, Rudiger [1 ]
Deconinck, Geert [4 ,6 ]
Helsen, Lieve [5 ,6 ]
Nowe, Ann [3 ]
Messagie, Maarten [2 ]
Camargo, Luis Ramirez [2 ]
机构
[1] ABB, Hoge Wei 27, B-1930 Zaventem, Belgium
[2] Vrije Univ Brussel VUB, ETEC MOBI, Pl Laan 2, B-1050 Brussels, Belgium
[3] Vrije Univ Brussel VUB, AI Lab, Pl Laan 2, B-1050 Brussels, Belgium
[4] Katholieke Univ Leuven, ESAT ELECTA, Kasteelpk Arenberg 10, B-3001 Leuven, Belgium
[5] Katholieke Univ Leuven, Dept Mech Engn TME, Celestijnenlaan 300, B-3001 Leuven, Belgium
[6] EnergyVille, B-3600 Genk, Belgium
关键词
Model-predictive control; Reinforcement learning; Optimal control; Multi-energy systems; DEMAND-SIDE MANAGEMENT; ENERGY MANAGEMENT; OPTIMIZATION;
D O I
10.1016/j.apenergy.2021.117634
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
Model predictive control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on-and off-policy multi-objective reinforcement learning (RL) approach that does not assume a model a priori, benchmarking this against a linear MPC (LMPC - to reflect current practice, though non-linear MPC performs better) both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers the potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work.
引用
收藏
页数:12
相关论文
共 68 条
[1]  
Abouheaf MI, 2014, 2014 IEEE 11 INT MUL, P1, DOI 10.1109/SSD.2014.6808789
[2]   Multiagent Reinforcement Learning for Energy Management in Residential Buildings [J].
Ahrarinouri, Mehdi ;
Rastegar, Mohammad ;
Seifi, Ali Reza .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (01) :659-666
[3]  
Andersson C., 2016, PYFMI PYTHON PACKAGE
[4]  
Andresen L, 2015, 11 INT MOD C 2015, V118, P695
[5]  
[Anonymous], 2013, 4 INT WORKSH AG TECH
[6]  
Arnold M, 2010, INTEL SYST CONTR AUT, V42, P235, DOI 10.1007/978-90-481-3598-1_10
[7]  
Arnold M., 2009, P POW EN SOC GEN M, P1, DOI [DOI 10.1109/PES.2009.5275230, 10.1109/PES.2009.5275230]
[8]   An open-source model for optimal design and operation of industrial energy systems [J].
Atabay, Dennis .
ENERGY, 2017, 121 :803-821
[9]  
Bati M, 2014, ICIST 2014 4 INT C I, V1, P67
[10]   Combined energy hub optimisation and demand side management for buildings [J].
Batic, Marko ;
Tomasevic, Nikola ;
Beccuti, Giovanni ;
Demiray, Turhan ;
Vranes, Sanja .
ENERGY AND BUILDINGS, 2016, 127 :229-241