Intelligent and Reconfigurable Architecture for KL Divergence-Based Multi-Armed Bandit Algorithms

被引：8

作者：

Santosh, S. V. Sai ^{[1
]}

Darak, Sumit J. ^{[1
]}

机构：

[1] IIIT Delhi, Elect & Commun Dept, New Delhi 110020, India

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2021年 / 68卷 / 03期

关键词：

Computer architecture; Heuristic algorithms; Optimization; Switches; Task analysis; Circuits and systems; Robots; Multi-armed bandit; intelligent architecture; Zynq platform; partial reconfiguration; AD-HOC;

D O I：

10.1109/TCSII.2020.3020634

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The Multi-armed bandit (MAB) algorithms do not need any training phase and can be deployed directly in an unknown environment. MAB algorithms can identify the best arm among several arms by achieving a balance between exploration of all arms and exploitation of optimal arm. The Kullback-Leibler divergence based upper confidence bound (KLUCB) is the state-of-the-art MAB algorithm that optimizes exploration-exploitation trade-off but it is complex due to underlining optimization routine. This limits its usefulness for robotics and radio applications which demand integration of KLUCB with the physical layer on the system on chip (SoC). In this brief, we efficiently map the KLUCB algorithm on SoC by realizing optimization routine via alternative synthesizable computation without compromising on the performance. The proposed architecture is dynamically reconfigurable such that the number of arms, as well as type of algorithm, can be changed on-the-fly. Specifically, after initial learning, on-the-fly switch to light-weight UCB offers around 10-factor improvement in latency and throughput. Since learning duration depends on the unknown arm statistics, an intelligence is embedded in architecture to decide the switching instant. We validate the functional correctness and usefulness of the proposed work via a realistic wireless application and detailed complexity analysis demonstrates its feasibility in realizing intelligent radios.

引用

页码：1008 / 1012

页数：5

共 7 条

[1] [Anonymous], 2011, P 24 ANN C LEARN THE
[2] Bouneffouf D., 2019, P INT JOINT C ART IN, P1
[3] Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks
Darak, Sumit J.
Hanawal, Manjesh K.
[J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2019, 37 (10) : 2350 - 2363
[4] Lattimore T., 2018, PREPRINT
[5] Clustering in Multi-Channel Cognitive Radio Ad Hoc and Sensor Networks
Ozger, Mustafa
Alagoz, Fatih
Akan, Ozgur B.
[J]. IEEE COMMUNICATIONS MAGAZINE, 2018, 56 (04) : 156 - 162
[6] A survey of machine learning for big data processing
Qiu, Junfei
Wu, Qihui
Ding, Guoru
Xu, Yuhua
Feng, Shuo
[J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
[7] Introduction to Multi-Armed Bandits Preface
Slivkins, Aleksandrs
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286

← 1 →