A Reinforcement-Learning-Based Distributed Resource Selection Algorithm for Massive IoT

被引：15

作者：

Ma, Jing ^{[1
,3
]}

Hasegawa, So ^{[1
,3
]}

Kim, Song-Ju ^{[2
]}

Hasegawa, Mikio ^{[1
,3
]}

机构：

[1] Tokyo Univ Sci, Dept Elect Engn, Tokyo 1258585, Japan

[2] Keio Univ, Grad Sch Media & Governance, Fujisawa, Kanagawa 2520882, Japan

[3] Grad Sch Engn, Dept Elect Engn, Katsushika Ku, Katsushika Campus,6-3-1 Niijyuku, Tokyo, Japan

来源：

APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 18期

基金：

日本学术振兴会;

关键词：

reinforcement learning; multi-armed bandit; IoT; distributed channel selection;

D O I：

10.3390/app9183730

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Massive IoT including the large number of resource-constrained IoT devices has gained great attention. IoT devices generate enormous traffic, which causes network congestion. To manage network congestion, multi-channel-based algorithms are proposed. However, most of the existing multi-channel algorithms require strict synchronization, an extra overhead for negotiating channel assignment, which poses significant challenges to resource-constrained IoT devices. In this paper, a distributed channel selection algorithm utilizing the tug-of-war (TOW) dynamics is proposed for improving successful frame delivery of the whole network by letting IoT devices always select suitable channels for communication adaptively. The proposed TOW dynamics-based channel selection algorithm has a simple reinforcement learning procedure that only needs to receive the acknowledgment (ACK) frame for the learning procedure, while simply requiring minimal memory and computation capability. Thus, the proposed TOW dynamics-based algorithm can run on resource-constrained IoT devices. We prototype the proposed algorithm on an extremely resource-constrained single-board computer, which hereafter is called the cognitive-IoT prototype. Moreover, the cognitive-IoT prototype is densely deployed in a frequently-changing radio environment for evaluation experiments. The evaluation results show that the cognitive-IoT prototype accurately and adaptively makes decisions to select the suitable channel when the real environment regularly varies. Accordingly, the successful frame ratio of the network is improved.

引用

页数：15

共 19 条

[1]

[Anonymous], 2016, 802154E2012 IEEE 802

[2]

[Anonymous], 2016, 802154G2012 IEEE 802

[3] Finite-time analysis of the multiarmed bandit problem [J].

Auer, P ;

Cesa-Bianchi, N ;

Fischer, P .

MACHINE LEARNING, 2002, 47 (2-3) :235-256

[4]

Chincoli M., 2017, P 2017 IEEE 14 INT C

[5]

CompTIA, SIZ INT THINGS

[6] Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks [J].

Jang, Sung-Jeen ;

Han, Chul-Hee ;

Lee, Kwang-Eog ;

Yoo, Sang-Jo .

EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2019, 2019 (1)

[7]

Kim SG, 2011, LECT NOTES COMPUT SC, V7087, P36

[8] Efficient decision-making by volume-conserving physical object [J].

Kim, Song-Ju ;

Aono, Masashi ;

Nameda, Etsushi .

NEW JOURNAL OF PHYSICS, 2015, 17

[9] Tug-of-war model for the two-bandit problem: Non locally-correlated parallel exploration via resource conservation [J].

Kim, Song-Ju ;

Aono, Masashi ;

Hara, Masahiko .

BIOSYSTEMS, 2010, 101 (01) :29-36

[10] Amoeba-inspired algorithm for cognitive medium access [J].

Kima, Song-Ju ;

Aono, Masashi .

IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2014, 5 (02) :198-209

← 1 2 →