Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions

被引：7

作者：

Chen, Gang ^{[1
]}

Douch, Colin I. J. ^{[1
]}

Zhang, Mengjie ^{[1
]}

机构：

[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington 6140, New Zealand

来源：

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION | 2016年 / 20卷 / 06期

关键词：

Fuzzy systems; gradient methods; learning systems; XCS; CONTROLLERS; PERFORMANCE;

D O I：

10.1109/TEVC.2016.2560139

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite their proven effectiveness, many Michigan learning classifier systems (LCSs) cannot perform multistep reinforcement learning in continuous spaces. To meet this technical challenge, some LCSs have been designed to learn fuzzy logic rules. They can be largely classified into strength-based and accuracy-based systems. The latter is gaining more research attention in the last decade. However, existing accuracy-based learning systems either address primarily single-step learning problems or require the action space to be discrete. In this paper, a new accuracy-based learning fuzzy classifier system (LFCS) is developed to explicitly handle continuous state input and continuous action output during multistep reinforcement learning. Several technical improvements have been achieved while developing the new learning algorithm. Particularly, we have successfully extended Q-learning like credit assignment methods to continuous spaces. To enable direct learning of stochastic strategies for action selection, we have also proposed to use a new fuzzy logic system with stochastic action outputs. Moreover, fine-grained learning of fuzzy rules has been achieved effectively in our algorithm by using a natural gradient learning method. It is the first time that these techniques are utilized substantially in any accuracy-based LFCSs. Meanwhile, in comparison with several recently proposed learning algorithms, our algorithm is shown to perform highly competitively on four benchmark learning problems and a robotics problem. The practical usefulness of our algorithm is also demonstrated by improving the performance of a wireless body area network.

引用

页码：953 / 971

页数：19

共 67 条

[1] Natural gradient works efficiently in learning [J].

Amari, S .

NEURAL COMPUTATION, 1998, 10 (02) :251-276

[2]

Bacardit J, 2008, STUD COMPUT INTELL, V125, P17

[3]

Barry A, 2004, STUD FUZZ SOFT COMP, V150, P15

[4] GP-COACH: Genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems [J].

Berlanga, F. J. ;

Rivera, A. J. ;

del Jesus, M. J. ;

Herrera, F. .

INFORMATION SCIENCES, 2010, 180 (08) :1183-1200

[5] Natural actor-critic algorithms [J].

Bhatnagar, Shalabh ;

Sutton, Richard S. ;

Ghavamzadeh, Mohammad ;

Lee, Mark .

AUTOMATICA, 2009, 45 (11) :2471-2482

[6] Learning fuzzy classifier systems for multi-agent coordination [J].

Bonarini, A ;

Trianni, V .

INFORMATION SCIENCES, 2001, 136 (1-4) :215-239

[7]

Bonarini A., 2000, LEARNING CLASSIFIER, P83

[8]

Bonarini Andrea., 1996, FUZZY MODELLING, P265

[9]

Budd A., 2006, UTOPIAN GENUINE UNCO

[10]

Busoniu L., 2011, Proceedings of the 2011 IEEE SSCI Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011), P1, DOI 10.1109/ADPRL.2011.5967353

← 1 2 3 4 5 6 7 →