Hardware implementation of the upper confidence-bound algorithm for reinforcement learning

被引：4

作者：

Radovic, Nevena ^{[1
]}

Erceg, Milena ^{[1
]}

机构：

[1] Univ Montenegro, Elect Engn Dept, Cetinjski Put Bb, Podgorica 81000, Montenegro

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2021年 / 96卷

关键词：

FPGA; Hardware implementation; Machine learning; Multi-armed bandit problem; Upper confidence-bound algorithm; ARCHITECTURE;

D O I：

10.1016/j.compeleceng.2021.107537

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The upper confidence-bound algorithm has been identified as a popular and useful approach in reinforcement learning, suitable for solving diverse modern-day problems. In this paper, we have developed efficient, multiple-clock-cycle hardware for this algorithm to ensure its practical application in real-time. The real-life situation that belongs to a class of problems commonly known as multi-armed bandit problems has been observed. The developed design is tested and verified by a field-programmable gate array circuit design. The obtained results have the degree of accuracy of the ones achieved in software simulation, which proofs the robustness of the developed solution. In terms of execution time, the proposed hardware implementation signifi-cantly outperforms the software simulation. Finally, the calculation complexity of the imple-mentation does not depend on the number of observed iterations, which guarantees the effective implementation of the developed design. All implementation details have been provided.

引用

页数：9

共 50 条

[1] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Spano, Sergio
Cardarilli, Gian Carlo
Di Nunzio, Luca
Fazzolari, Rocco
Giardino, Daniele
Matta, Marco
Nannarelli, Alberto
Re, Marco
IEEE ACCESS, 2019, 7 : 186340 - 186351
[2] Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment
Melesko, Jaroslav
Novickij, Vitalij
APPLIED SCIENCES-BASEL, 2019, 9 (20):
[3] A Hardware Implementation of SOM Neural Network Algorithm
Yi, Qian
2018 INTERNATIONAL CONFERENCE ON SENSOR NETWORKS AND SIGNAL PROCESSING (SNSP 2018), 2018, : 508 - 511
[4] The hardware implementation of a genetic algorithm model with FPGA
Tu, L
Zhu, MC
Wang, JX
2002 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2002, : 374 - 377
[5] An implementation of a reinforcement learning based algorithm for factory layout planning
Klar, Matthias
Glatt, Moritz
Aurich, Jan C.
MANUFACTURING LETTERS, 2021, 30 : 1 - 4
[6] A novel action decision method of deep reinforcement learning based on a neural network and confidence bound
Wenhao Zhang
Yaqing Song
Xiangpeng Liu
Qianqian Shangguan
Kang An
Applied Intelligence, 2023, 53 : 21299 - 21311
[7] A novel action decision method of deep reinforcement learning based on a neural network and confidence bound
Zhang, Wenhao
Song, Yaqing
Liu, Xiangpeng
Shangguan, Qianqian
An, Kang
APPLIED INTELLIGENCE, 2023, 53 (18) : 21299 - 21311
[8] Hardware implementation of block matching algorithm with FPGA technology
Loukil, H
Ghozzi, F
Samet, A
Ben Ayed, MA
Masmoudi, N
16TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, PROCEEDINGS, 2004, : 542 - 546
[9] Testing of hardware implementation of infrared image enhancing algorithm
Dulski, R.
Sosnowski, T.
Piatkowski, T.
Trzaskawka, P.
Kastek, M.
Kucharz, J.
ELECTRO-OPTICAL AND INFRARED SYSTEMS: TECHNOLOGY AND APPLICATIONS IX, 2012, 8541
[10] Novel Benes Network Routing Algorithm and Hardware Implementation
Nikolaidis, Dimitris
Groumas, Panos
Kouloumentas, Christos
Avramopoulos, Hercules
TECHNOLOGIES, 2022, 10 (01)

← 1 2 3 4 5 →