A new distributed reinforcement learning algorithm for multiple objective optimization problems

被引：0

作者：

Mariano, C

Morales, E

机构：

[1] Inst Mexicano Tecnol Agua, Jiutepec 62550, Morelos, Mexico

[2] ITESM, Temixco 62589, Morelos, Mexico

来源：

ADVANCES IN ARTIFICIAL INTELLIGENCE | 2000年 / 1952卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a new algorithm, called MDQL, for the solution of multiple objective optimization problems. MDQL is based on a new distributed Q-learning algorithm, called DQL, which is also introduced in this paper. In DQL a family of independent agents, exploring different options, finds a common policy in a common environment. Information about action goodness is transmitted using traces over state-action pairs. MDQL extends this idea to multiple objectives, assigning a family of agents for each objective involved. A non-dominant criterion is used to construct Pareto fronts and by delaying adjustments on the rewards MDQL achieves better distributions of solutions. Furthermore, an extension for applying reinforcement learning to continuous functions is also given. Successful results of MDQL on several test-bed problems suggested in the literature are described.

引用

页码：290 / 299

页数：10

共 10 条

[1] BOUTILIER C, 1999, P IJCAI 99 STOCKH SW
[2] Christopher JohnCornish Hella by Watkins., 1989, Learning from delayed rewards
[3] Coello C. A. C., 1999, Knowledge and Information Systems, V1, P269
[4] DEB K, 1998, CI4998 TR U DORTM DE
[5] Fonseca C. M., 1995, P 1 INT C GEN ALG EN, P45
[6] Lamont G. B., 1999, P 1999 ACM S APPL CO, P351, DOI DOI 10.1145/298151.298382
[7] Littman M.L., 1994, MACHINE LEARNING P 1, P157, DOI 10.1016/B978-1-55860-335-6.50027-1
[8] Mariano C, 2000, LECT NOTES ARTIF INT, V1793, P212
[9] Tan M, 1993, P 10 INT C MACHINE L, P330, DOI DOI 10.1016/B978-1-55860-307-3.50049-6
[10] Viennet R, 1996, INT J SYST SCI, V27, P255, DOI 10.1080/00207729608929211

← 1 →