An RL-based scheduling algorithm for video traffic in high-rate wireless personal area networks

被引:4
作者
Moradi, Shahab [1 ]
Mohsenian-Rad, A. Hamed [1 ]
Wong, Vincent W. S. [1 ]
机构
[1] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V5Z 1M9, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Wireless personal area networks; Scheduling; QoS; Ultra-wide band (UWB); Markov decision process (MDP); Reinforcement learning; REINFORCEMENT LEARNING APPROACH; QOS;
D O I
10.1016/j.comnet.2009.07.012
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The emerging high-rate wireless personal area network (WPAN) technology is capable of supporting high-speed and high-quality real-time multimedia applications. In particular, video streams are deemed to be a dominant traffic type, and require quality of service (QoS) support. However, in the current IEEE 802.15.3 standard for MAC (media access control) of high-rate WPANs, the implementation details of some key issues such as scheduling and QoS provisioning have not been addressed. In this paper, we first propose a Markov decision process (MDP) model for optimal scheduling for video flows in high-rate WPANs. Using this model, we also propose a scheduler that incorporates compact state space representation, function approximation, and reinforcement learning (RL). Simulation results show that our proposed RL scheduler achieves nearly optimal performance and performs better than F-SRPT, EDD + SRPT, and PAP scheduling algorithms in terms of a lower decoding failure rate. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2997 / 3010
页数:14
相关论文
共 34 条
[1]  
[Anonymous], 1998, 144962 IS
[2]   An energy diffserv and application-aware MAC scheduling for VBR streaming video in the IEEE 802.15.3 high-rate wireless personal area networks [J].
Chen, Xi ;
Xiao, Yang ;
Cai, Yu ;
Lu, Jianhua ;
Zhou, Zucheng .
COMPUTER COMMUNICATIONS, 2006, 29 (17) :3516-3526
[3]  
Chu YC, 2004, IEEE MILIT COMMUN C, P1100
[4]  
Darken C., 1992, P NEUR NETW SIGN PRO
[5]   Solving semi-Markov decision problems using average reward reinforcement learning [J].
Das, TK ;
Gosavi, A ;
Mahadevan, S ;
Marchalleck, N .
MANAGEMENT SCIENCE, 1999, 45 (04) :560-574
[6]   MPEG-4 and H.263 video traces for network performance evaluation [J].
Fitzek, FHP ;
Reisslein, M .
IEEE NETWORK, 2001, 15 (06) :40-54
[7]  
Giménez-Guzmán JM, 2006, LECT NOTES COMPUT SC, V3883, P115
[8]   Reinforcement learning for long-run average cost [J].
Gosavi, A .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2004, 155 (03) :654-674
[9]  
Gosavi A, 2002, IIE TRANS, V34, P729
[10]  
HU C, 2006, P IEEE INF BARC SPAI