Reinforcement learning in batch processes

被引：0

作者：

Wilson, JA ^{[1
]}

Martinez, EC ^{[1
]}

机构：

[1] Univ Nottingham, Sch Chem Environm & Mining Engn, Nottingham NG7 2RD, England

来源：

APPLICATION OF NEURAL NETWORKS AND OTHER LEARNING TECHNOLOGIES IN PROCESS ENGINEERING | 2001年

关键词：

D O I：

10.1142/9781848161467_0012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Conventional methods for batch chemical process optimisation and control depend on having both perfect process models and measurements available. Here, to avoid this, we apply a novel methodology centred on reinforcement learning (RL) whereby, unlike most forms of machine learning, an autonomous agent is not instructed on how to act by example but instead learns directly by trying control actions and seeking for those giving maximum reward. A central notion is the performance or value function that, in a given current state, signifies the contribution a specific action will make towards maximising the final performance or reward over an entire batch. For batch-to-batch, incremental learning and control, the initially unknown value function is here represented using wire fitting and a neural network. This is a simple yet powerful means of simultaneously learning and fitting the value function. The performance achieved in each completed batch can be propagated from the end point back through the intermediate states. With echoes of dynamic programming, this allows calculation of Bellman=s errors which can be minimised in neural network fitting. The higher level optimisation and control problem in batch processing thus fits neatly into this framework and some results of a case study illustrate the potential of the approach.

引用

页码：269 / 286

页数：18

共 8 条

[1]

BAIRD L, 1993, WLTR931147

[2]

Bertsekas D.P., 2005, DYNAMIC PROGRAMMING, V1

[3]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[4]

Bertsekas DP, 1995, Dynamic Programming and Optimal Control, V2

[5] A hybrid neural network first principles approach to batch unit optimisation [J].

Martinez, EC ;

Wilson, JA .

COMPUTERS & CHEMICAL ENGINEERING, 1998, 22 :S893-S896

[6]

SUTTON RS, 1997, INTRO REINFORCEMENT

[7]

Terwiesch P., 1994, Journal of Process Control, V4, P238, DOI 10.1016/0959-1524(94)80045-6

[8]

Wilson JA, 1997, COMPUT CHEM ENG, V21, pS1233

← 1 →