Learning to Entangle Radio Resources in Vehicular Communications: An Oblivious Game-Theoretic Perspective

被引：12

作者：

Chen, Xianfu ^{[1
]}

Wu, Celimuge ^{[2
]}

Bennis, Mehdi ^{[3
]}

Zhao, Zhifeng ^{[4
]}

Han, Zhu ^{[5
,6
]}

机构：

[1] VTT Tech Res Ctr Finland, Oulu 1000, Finland

[2] Univ Electrocommun, Grad Sch Informat & Engn, Tokyo 1828585, Japan

[3] Univ Oulu, Ctr Wireless Commun, Oulu 90014, Finland

[4] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China

[5] Univ Houston, Houston, TX 77004 USA

[6] Kyung Hee Univ, Dept Comp Sci & Engn, Seoul 02447, South Korea

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2019年 / 68卷 / 05期

基金：

美国国家科学基金会; 芬兰科学院;

关键词：

Vehicle-to-vehicle communications; Markov decision process; stochastic games; Markov perfect equilibrium; oblivious equilibrium; reinforcement learning; NETWORK; INFORMATION;

D O I：

10.1109/TVT.2019.2907589

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we investigate non-cooperative radio resource management in a vehicle-to-vehicle communication network. The technical challenges lie in high-vehicle mobility and data traffic variations. Over the discrete scheduling slots, each vehicle user equipment (VUE)-pair competes with other VUE-pairs in the coverage of a road side unit (RSU) for the limited frequency to transmit queued data packets, aiming to optimize the expected long-term performance. The frequency allocation at the beginning of each slot at the RSU is regulated by a sealed second-price auction. Such interactions among VUE-pairs are modeled as a stochastic game with a semi-continuous global network state space. By defining a partitioned control policy, we transform the original game into an equivalent stochastic game with a global queue state space of finite size. We adopt an oblivious equilibrium (OE) to approximate the Markov perfect equilibrium, which characterizes the optimal solution to the equivalent game. The OE solution is theoretically proven to be with an asymptotic Markov equilibrium property. Due to the lack of a priori knowledge of network dynamics, we derive an online algorithm to learn the OE solution. Numerical simulations validate the theoretical analysis and show the effectiveness of the proposed online learning algorithm.

引用

页码：4262 / 4274

页数：13

共 39 条

[21] Ultra-Reliable and Low-Latency Vehicular Transmission: An Extreme Value Theory Approach [J].