Hardware implementation of the upper confidence-bound algorithm for reinforcement learning

被引:4
作者
Radovic, Nevena [1 ]
Erceg, Milena [1 ]
机构
[1] Univ Montenegro, Elect Engn Dept, Cetinjski Put Bb, Podgorica 81000, Montenegro
关键词
FPGA; Hardware implementation; Machine learning; Multi-armed bandit problem; Upper confidence-bound algorithm; ARCHITECTURE;
D O I
10.1016/j.compeleceng.2021.107537
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The upper confidence-bound algorithm has been identified as a popular and useful approach in reinforcement learning, suitable for solving diverse modern-day problems. In this paper, we have developed efficient, multiple-clock-cycle hardware for this algorithm to ensure its practical application in real-time. The real-life situation that belongs to a class of problems commonly known as multi-armed bandit problems has been observed. The developed design is tested and verified by a field-programmable gate array circuit design. The obtained results have the degree of accuracy of the ones achieved in software simulation, which proofs the robustness of the developed solution. In terms of execution time, the proposed hardware implementation signifi-cantly outperforms the software simulation. Finally, the calculation complexity of the imple-mentation does not depend on the number of observed iterations, which guarantees the effective implementation of the developed design. All implementation details have been provided.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] A Hardware Accelerator for Language-Guided Reinforcement Learning
    Shiri, Aidin
    Mazumder, Arnab Neelim
    Prakash, Bharat
    Homayoun, Houman
    Waytowich, Nicholas R.
    Mohsenin, Tinoosh
    IEEE DESIGN & TEST, 2022, 39 (03) : 37 - 44
  • [32] A learning search algorithm with propagational reinforcement learning
    Wei Zhang
    Applied Intelligence, 2021, 51 : 7990 - 8009
  • [33] A learning search algorithm with propagational reinforcement learning
    Zhang, Wei
    APPLIED INTELLIGENCE, 2021, 51 (11) : 7990 - 8009
  • [34] An efficient algorithm for modulus operation and its hardware implementation in prime number calculation
    Wijesinghe, W. A. Susantha
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2025, 191
  • [35] Hardware Implementation and Study of Inverse Algorithm in Finite Fields
    Bao Kejin
    Song Yonggang
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2006, 6 (9A): : 38 - 44
  • [36] Threefish-256 algorithm implementation on reconfigurable hardware
    Nieto-Ramirez, Nathaly
    Dario Nieto-Londono, Ruben
    REVISTA ITECKNE, 2014, 11 (02): : 149 - 156
  • [37] A hardware implementation of a content based image retrieval algorithm
    Skarpathiotis, C
    Dimond, KR
    FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2004, 3203 : 1165 - 1167
  • [38] Algorithm optimisation and hardware implementation of interprediction mode decision
    Shi, Long-zhao
    Yan, Danyu
    Hong, Xiaojian
    Huang, Bo
    Yang, Xiuzhi
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (03) : 593 - 601
  • [39] An Efficient Hardware Implementation of Canny Edge Detection Algorithm
    Sangeetha, D.
    Deepa, P.
    2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, : 457 - 462
  • [40] Wavelet-transform steganography: algorithm and hardware implementation
    Mohd, Bassam J.
    Hayajneh, Thaier
    Quttoum, Ahmad Nahar
    INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2013, 5 (3-4) : 241 - 256