A reinforcement learning framework for path selection and wavelength selection in optical burst switched networks

被引：43

作者：

Kiran, Y. V. ^{[1
]}

Venkatesh, T. ^{[1
]}

Murthy, C. Siva Ram ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Comp Sci & Engn, Madras 600036, Tamil Nadu, India

来源：

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS | 2007年 / 25卷 / 09期

关键词：

optical burst switching; multi-armed bandit problem; Q-learning; path selection; wavelength selection;

D O I：

10.1109/JSAC-OCN.2007.028806

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Optical Burst Switching (OBS) is a promising technology that exploits the benefits of optical communication and supports statistical multiplexing of data traffic at a fine granularity making it a suitable technology for the next generation Internet. Contention among the bursts that arrive simultaneously at a core node leads to burst loss which affects the throughput of higher layer traffic. Development of efficient algorithms for path selection and wavelength selection is crucial to minimize the burst loss probability (BLP) in OBS networks. In this paper, we formulate path selection and wavelength selection in OBS networks as a multi-armed bandit problem and discuss the difficulties to solve them optimally. We propose algorithms based on Q-learning to solve these problems near-optimally. At an egress node, the path selection algorithm evaluates the Q values for a set of precomputed paths and chooses a path that corresponds to minimum BLP. Similarly, Q-learning algorithm for wavelength selection selects a wavelength in a pre-routed path such that the BLP is minimized. We do not assume wavelength conversion and buffering at the core nodes and hence, selection of path and wavelength is done only at the edge nodes. We simulate the proposed algorithms under dynamic load to demonstrate that they reduce the BLP compared to the other adaptive algorithms available in the literature.

引用

页码：18 / 26

页数：9

共 21 条

[1]

[Anonymous], NS SIMULATOR

[2]

CAO X, 2002, P IEEE GLOBECOM, P84

[3]

DUFF M, 1995, P 12 INT C MACH LEAR, P209

[4]

GITTINS JC, 1979, J ROY STAT SOC B MET, V41, P148

[5] Self-learning route selection scheme using multipath searching packets in an OBS network [J].

Ishii, D ;

Yamanaka, N ;

Sasase, I .

JOURNAL OF OPTICAL NETWORKING, 2005, 4 (07) :432-445

[6] ON THE CONVERGENCE OF STOCHASTIC ITERATIVE DYNAMIC-PROGRAMMING ALGORITHMS [J].

JAAKKOLA, T ;

JORDAN, MI ;

SINGH, SP .

NEURAL COMPUTATION, 1994, 6 (06) :1185-1201

[7] Reinforcement learning: A survey [J].

Kaelbling, LP ;

Littman, ML ;

Moore, AW .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285

[8]

KIRAN YV, 2006, P 3 INT C BROADB COM

[9] Load balancing using adaptive alternate routing in IP-over-WDM optical burst switching networks [J].

Li, J ;

Mohan, G ;

Chua, KC .

OPTICOMM 2003: OPTICAL NETWORKING AND COMMUNICATIONS, 2003, 5285 :336-345

[10]

LITTMAN M, 1993, NEURAL NETW INNS, P45

← 1 2 3 →