Hardware implementation of the upper confidence-bound algorithm for reinforcement learning

被引:4
作者
Radovic, Nevena [1 ]
Erceg, Milena [1 ]
机构
[1] Univ Montenegro, Elect Engn Dept, Cetinjski Put Bb, Podgorica 81000, Montenegro
关键词
FPGA; Hardware implementation; Machine learning; Multi-armed bandit problem; Upper confidence-bound algorithm; ARCHITECTURE;
D O I
10.1016/j.compeleceng.2021.107537
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The upper confidence-bound algorithm has been identified as a popular and useful approach in reinforcement learning, suitable for solving diverse modern-day problems. In this paper, we have developed efficient, multiple-clock-cycle hardware for this algorithm to ensure its practical application in real-time. The real-life situation that belongs to a class of problems commonly known as multi-armed bandit problems has been observed. The developed design is tested and verified by a field-programmable gate array circuit design. The obtained results have the degree of accuracy of the ones achieved in software simulation, which proofs the robustness of the developed solution. In terms of execution time, the proposed hardware implementation signifi-cantly outperforms the software simulation. Finally, the calculation complexity of the imple-mentation does not depend on the number of observed iterations, which guarantees the effective implementation of the developed design. All implementation details have been provided.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
    Spano, Sergio
    Cardarilli, Gian Carlo
    Di Nunzio, Luca
    Fazzolari, Rocco
    Giardino, Daniele
    Matta, Marco
    Nannarelli, Alberto
    Re, Marco
    IEEE ACCESS, 2019, 7 : 186340 - 186351
  • [2] Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment
    Melesko, Jaroslav
    Novickij, Vitalij
    APPLIED SCIENCES-BASEL, 2019, 9 (20):
  • [3] A Hardware Implementation of SOM Neural Network Algorithm
    Yi, Qian
    2018 INTERNATIONAL CONFERENCE ON SENSOR NETWORKS AND SIGNAL PROCESSING (SNSP 2018), 2018, : 508 - 511
  • [4] The hardware implementation of a genetic algorithm model with FPGA
    Tu, L
    Zhu, MC
    Wang, JX
    2002 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2002, : 374 - 377
  • [5] An implementation of a reinforcement learning based algorithm for factory layout planning
    Klar, Matthias
    Glatt, Moritz
    Aurich, Jan C.
    MANUFACTURING LETTERS, 2021, 30 : 1 - 4
  • [6] A novel action decision method of deep reinforcement learning based on a neural network and confidence bound
    Wenhao Zhang
    Yaqing Song
    Xiangpeng Liu
    Qianqian Shangguan
    Kang An
    Applied Intelligence, 2023, 53 : 21299 - 21311
  • [7] A novel action decision method of deep reinforcement learning based on a neural network and confidence bound
    Zhang, Wenhao
    Song, Yaqing
    Liu, Xiangpeng
    Shangguan, Qianqian
    An, Kang
    APPLIED INTELLIGENCE, 2023, 53 (18) : 21299 - 21311
  • [8] Hardware implementation of block matching algorithm with FPGA technology
    Loukil, H
    Ghozzi, F
    Samet, A
    Ben Ayed, MA
    Masmoudi, N
    16TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, PROCEEDINGS, 2004, : 542 - 546
  • [9] Testing of hardware implementation of infrared image enhancing algorithm
    Dulski, R.
    Sosnowski, T.
    Piatkowski, T.
    Trzaskawka, P.
    Kastek, M.
    Kucharz, J.
    ELECTRO-OPTICAL AND INFRARED SYSTEMS: TECHNOLOGY AND APPLICATIONS IX, 2012, 8541
  • [10] Novel Benes Network Routing Algorithm and Hardware Implementation
    Nikolaidis, Dimitris
    Groumas, Panos
    Kouloumentas, Christos
    Avramopoulos, Hercules
    TECHNOLOGIES, 2022, 10 (01)