Continuous-Observation One-Sided Two-Player Zero-Sum Partially Observable Stochastic Game With Public Actions

被引：4

作者：

Zheng, Wei ^{[1
]}

Jung, Taeho ^{[2
]}

Lin, Hai ^{[1
]}

机构：

[1] Univ Notre Dame, Elect Engn Dept, Notre Dame, IN 46556 USA

[2] Univ Notre Dame, Comp Sci & Engn Dept, Notre Dame, IN 46556 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 12期

基金：

美国国家科学基金会;

关键词：

Game theory; Markov processes; stochastic game; stochastic optimal control;

D O I：

10.1109/TAC.2023.3276749

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The one-sided two-player zero-sum partially observable stochastic games (OTZ-POSG) have emerged as popular models recently, especially in the cyber-security literature. In this game, the state is hidden for one player but directly observed by the other player for whom we call the informed player. All existing OTZ-POSG models assumed that the observation space is discrete and the action of the informed player is private. However, these assumptions may become invalid in real applications. For example, for the virtual machine migration techniques in moving target defenses, the observed traffic data of a network are in a continuous space and the switching strategy of the defender can be inferred by the attacker. This article, therefore, proposes a continuous-observation OTZ-POSG with public actions and studies the existence of its equilibrium. The main challenge induced by the public action is the potential information leakage-the action of the informed player could reveal state information because his or her policy is state dependent. To solve this issue, we adopt a two-step belief update strategy for players and prove the existence of a Stackelberg equilibrium. We show that the game can be solved iteratively through value iteration. However, calculating the exact value function is impractical as the observation space is continuous. To mitigate the computational complexity issue, we propose a point-based approximation algorithm to approximate the exact value function and meanwhile extend the dynamic partitioning approach to discretize the observation space into finite discrete partitions. We show that the value function of the leader can be approximated by a piece-wise linear and concave function with an error bound. Finally, examples are used in each section to illustrate ideas of the proposed algorithms.

引用

页码：7390 / 7404

页数：15

共 39 条

[1]

[Anonymous], 2009, P 22 INT FLAIRS C

[2]

[Anonymous], 2007, Artificial intelligence and statistics

[3]

Basar T., 1982, DYNAMIC NONCOOPERATI

[4] FINITE- AND INFINITE-HORIZON SHAPLEY GAMES WITH NONSYMMETRIC PARTIAL OBSERVATION [J].

Basu, Arnab ;

Stettner, Lukasz .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2015, 53 (06) :3584-3619

[5] SEQUENTIAL STACKELBERG EQUILIBRIA IN 2-PERSON GAMES [J].

BRETON, M ;

ALJ, A ;

HAURIE, A .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1988, 59 (01) :71-97

[6]

Conitzer V, 2006, P 7 ACM C ELECT COMM, P82, DOI DOI 10.1145/1134707.1134717

[7]

Debroy S, 2016, 2016 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC)

[8] Game Theory for Cyber Security and Privacy [J].

Do, Cuong T. ;

Tran, Nguyen H. ;

Hong, Choongseon ;

Kamhoua, Charles A. ;

Kwiat, Kevin A. ;

Blasch, Erik ;

Ren, Shaolei ;

Pissinou, Niki ;

Iyengar, Sundaraja Sitharama .

ACM COMPUTING SURVEYS, 2017, 50 (02) :30-37

[9]

Emery-Montemerlo R., 2004, P 3 INT JOINT C AUT, P136

[10] A Puzzle-Based Defense Strategy Against Flooding Attacks Using Game Theory [J].

Fallah, Mehran S. .

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2010, 7 (01) :5-19

← 1 2 3 4 →