A bionic learning algorithm based on skinner's operant conditioning and control of robot

被引：3

作者：

Ren H. ^{[1
]}

Ruan X. ^{[1
]}

机构：

[1] School of Electronic and Control Engineering, Beijing University of Technology

来源：

Jiqiren/Robot | 2010年 / 32卷 / 01期

关键词：

Balance control; Eligibility trace; Self-learning; Skinner's operant conditioning; Two-wheeled robot;

D O I：

10.3724/SP.J.1218.2010.00132

中图分类号：

学科分类号：

摘要：

Aiming at the movement balance control problem of the two-wheeled self-balancing mobile robot, a bionic self-learning algorithm consisting of BP (backpropagation) neural network and eligibility traces based on Skinner's operant conditioning theory is put forward as a learning mechanism of the two-wheeled robot. The algorithm utilizes the characters of eligibility traces in resolving delay effect, increasing learning speed, and improving reliability and ability, so that the complex learning algorithm consisting of BP neural network and eligibility traces can predict the behavior evaluation function that the robot would obtain, and choose the optimum action corresponding to the biggest evaluation value according to the probability tendency mechanism by a certain probability. Thereby the two-wheeled robot can obtain the self-learning skills like a human or animal by interacting with, studying and training the unknown environment, and realize the movement balance control of the two-wheeled robot. Finally, two simulation experiments are done and compared using the BP algorithm and the complex learning algorithm consisting of BP neural network and eligibility traces based on Skinner's operant conditioning theory. The simulation results show that the learning mechanism of the complex learning algorithm consisting of BP neural network and eligibility traces based on Skinner's operant conditioning theory makes the robot obtain the better dynamic performance and the quicker learning speed, and reflect stronger self-learning skills and balance control abilities.

引用

页码：132 / 137

页数：5

共 14 条

[1] Gao Y., Chen S., Lu X., Research on reinforcement learning technology: A review, Acta Automatica Sinica, 30, 1, pp. 86-100, (2004)
[2] Zhang W., Lu T., Several crux problems of reinforcement learning application in robotics, Computer Engineering and Applications, 40, 4, (2004)
[3] Skinner B.F., The Behavior of Organisms, (1938)
[4] Wolf R., Heisenberg M., Basic organization of operant-behavior as revealed in drosophila flight orientation, Journal of Comparative Physiology A, 169, 6, pp. 699-705, (1991)
[5] Rosen B.E., Goodwin J.M., Vidal J.J., Machine operantconditioning, Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 1500-1501, (1988)
[6] Gaudiano P., Chang C., Adaptive obstacle avoidance with a neural network for operant conditioning: Experiments with real robots, IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp. 13-18, (1997)
[7] Zalama E., Gomez J., Paul M., Et al., Adaptive behavior navigation of a mobile robot, IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans, 32, 1, pp. 160-169, (2002)
[8] Itoh K., Miwa H., Matsumoto M., Et al., Behavior model of humanoid robots based on operant conditioning, IEEE/RAS International Conference on Humanoid Robots, pp. 220-225, (2005)
[9] Dominguez S., Zalama E., Garcia-Bermejo J.G., Et al., Lecture Notes in Computer Science, pp. 691-702, (2006)
[10] Singh S.P., Sutton R.S., Reinforcement learning with replacing eligibility traces, Machine Learning, 22, 1-3, pp. 123-158, (1996)

← 1 2 →