Control of HVAC-systems with Slow Thermodynamic Using Reinforcement Learning

被引：6

作者：

Blad, C. ^{[2
,4
]}

Koch, S. ^{[1
]}

Ganeswarathas, S. ^{[1
]}

Kallesoe, C. S. ^{[3
,4
]}

Bogh, S. ^{[1
,2
]}

机构：

[1] Aalborg Univ, Dept Mat & Prod, Fibigerstr 16, DK-9220 Aalborg, Denmark

[2] Aalborg Univ, Dept Mat & Prod, Robot & Automat Grp, Fibigerstr 16, DK-9220 Aalborg, Denmark

[3] Aalborg Univ, Dept Elect Syst, Fredrik Bajersvej 7, DK-9220 Aalborg, Denmark

[4] Grundfos AS, Poul Due Jensens Vej 7, DK-8850 Bjerringbro, Denmark

来源：

29TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING (FAIM 2019): BEYOND INDUSTRY 4.0: INDUSTRIAL ADVANCES, ENGINEERING EDUCATION AND INTELLIGENT MANUFACTURING | 2019年 / 38卷

关键词：

Sustainable Manufacturing Engineering and Resource-Efficient Production; Artificial Intelligence in Manufacturing; Modelling and Simulation; HVAC-Systems;

D O I：

10.1016/j.promfg.2020.01.159

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an adaptive controller based on Reinforcement Learning (RL), which copes with HVAC-systems consisting of slow thermodynamics. Two different RL algorithms with Q-Networks (QNs) are investigated. The HVAC-system is in this study an underfloor heating system. Underfloor heating is of great interest because it is very common in Scandinavia, but this research can be applied to a wide range of HVAC-systems, industrial processes and other control applications that are dominated by very slow dynamics. The environments consist of one, two, and four zones within a house in a simulation environment meaning that agents will be exposed to gradually more complex environments separated into test levels. The novelty of this paper is the incorporation of two different RL algorithms for industrial process control; a QN and a QN + Eligibility Trace (QN+ET). The reason for using eligibility trace is that an underfloor heating environment is dominated by slow dynamics and by using eligibility trace the agent can find correlations between the reward and actions taken in earlier iterations (C) 2019 The Authors. Published by Elsevier B.V.

引用

页码：1308 / 1315

页数：8

共 10 条

[1]

Andrychowicz M., 2017, Advances in neural information processing systems, P5048

[2]

[Anonymous], 2015, 3 INT C LEARN REPR I

[3]

Hessel Matteo, 32 AAAI C ARTIFICIAL

[4] SELF-IMPROVING REACTIVE AGENTS BASED ON REINFORCEMENT LEARNING, PLANNING AND TEACHING [J].

LIN, LJ .

MACHINE LEARNING, 1992, 8 (3-4) :293-321

[5] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[6] Model predictive control of a building heating system: The first experience [J].

Privara, Samuel ;

Siroky, Jan ;

Ferkl, Lukas ;

Cigler, Jiri .

ENERGY AND BUILDINGS, 2011, 43 (2-3) :564-572

[7]

Schaul T., 2016, P ICLR

[8]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

[9]

THRUN S, 1994, PROCEEDINGS OF THE 1993 CONNECTIONIST MODELS SUMMER SCHOOL, P255

[10]

Tsitsiklis John N., 1997, T AUTOMATIC CONTROL, V42

← 1 →