A Reinforcement-Learning-Based Distributed Resource Selection Algorithm for Massive IoT

被引:15
作者
Ma, Jing [1 ,3 ]
Hasegawa, So [1 ,3 ]
Kim, Song-Ju [2 ]
Hasegawa, Mikio [1 ,3 ]
机构
[1] Tokyo Univ Sci, Dept Elect Engn, Tokyo 1258585, Japan
[2] Keio Univ, Grad Sch Media & Governance, Fujisawa, Kanagawa 2520882, Japan
[3] Grad Sch Engn, Dept Elect Engn, Katsushika Ku, Katsushika Campus,6-3-1 Niijyuku, Tokyo, Japan
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 18期
基金
日本学术振兴会;
关键词
reinforcement learning; multi-armed bandit; IoT; distributed channel selection;
D O I
10.3390/app9183730
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Massive IoT including the large number of resource-constrained IoT devices has gained great attention. IoT devices generate enormous traffic, which causes network congestion. To manage network congestion, multi-channel-based algorithms are proposed. However, most of the existing multi-channel algorithms require strict synchronization, an extra overhead for negotiating channel assignment, which poses significant challenges to resource-constrained IoT devices. In this paper, a distributed channel selection algorithm utilizing the tug-of-war (TOW) dynamics is proposed for improving successful frame delivery of the whole network by letting IoT devices always select suitable channels for communication adaptively. The proposed TOW dynamics-based channel selection algorithm has a simple reinforcement learning procedure that only needs to receive the acknowledgment (ACK) frame for the learning procedure, while simply requiring minimal memory and computation capability. Thus, the proposed TOW dynamics-based algorithm can run on resource-constrained IoT devices. We prototype the proposed algorithm on an extremely resource-constrained single-board computer, which hereafter is called the cognitive-IoT prototype. Moreover, the cognitive-IoT prototype is densely deployed in a frequently-changing radio environment for evaluation experiments. The evaluation results show that the cognitive-IoT prototype accurately and adaptively makes decisions to select the suitable channel when the real environment regularly varies. Accordingly, the successful frame ratio of the network is improved.
引用
收藏
页数:15
相关论文
共 19 条
[1]  
[Anonymous], 2016, 802154E2012 IEEE 802
[2]  
[Anonymous], 2016, 802154G2012 IEEE 802
[3]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[4]  
Chincoli M., 2017, P 2017 IEEE 14 INT C
[5]  
CompTIA, SIZ INT THINGS
[6]   Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks [J].
Jang, Sung-Jeen ;
Han, Chul-Hee ;
Lee, Kwang-Eog ;
Yoo, Sang-Jo .
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2019, 2019 (1)
[7]  
Kim SG, 2011, LECT NOTES COMPUT SC, V7087, P36
[8]   Efficient decision-making by volume-conserving physical object [J].
Kim, Song-Ju ;
Aono, Masashi ;
Nameda, Etsushi .
NEW JOURNAL OF PHYSICS, 2015, 17
[9]   Tug-of-war model for the two-bandit problem: Non locally-correlated parallel exploration via resource conservation [J].
Kim, Song-Ju ;
Aono, Masashi ;
Hara, Masahiko .
BIOSYSTEMS, 2010, 101 (01) :29-36
[10]   Amoeba-inspired algorithm for cognitive medium access [J].
Kima, Song-Ju ;
Aono, Masashi .
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2014, 5 (02) :198-209