Control Theory Meets POMDPs: A Hybrid Systems Approach

被引:2
|
作者
Ahmadi, Mohamadreza [1 ]
Jansen, Nils [2 ]
Wu, Bo [3 ]
Topcu, Ufuk [3 ]
机构
[1] CALTECH, Ctr Autonomous Syst & Technol, Pasadena, CA 91125 USA
[2] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Dept Software Sci, NL-6500 GL Nijmegen, Netherlands
[3] Univ Texas Austin, Oden Inst Computat Engn & Sci, Austin, TX 78712 USA
关键词
Safety; Decision making; Markov processes; Kalman filters; Control theory; Bayes methods; Switched systems; Artificial intelligence; autonomous systems; control theory; Lyapunov methods; MARKOV-PROCESSES; OPTIMIZATION; APPROXIMATIONS; STABILITY;
D O I
10.1109/TAC.2020.3035755
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs); hence, making POMDPs intractable to solve and analyze. To overcome the complexity challenge of POMDPs, we apply techniques from the control theory. Our contributions are fourfold. 1) We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. 2) Then, in order to estimate the reachable belief space of a POMDP, i.e., the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations, we find overapproximations in terms of sublevel sets of Lyapunov-like functions. 3) Furthermore, in order to verify safety and performance requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along the solutions to the belief update equation of the POMDP, the safety and performance properties are guaranteed to hold. In both cases 2) and 3), the calculations can be decomposed and solved in parallel. 4) Finally, we show that the conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.
引用
收藏
页码:5191 / 5204
页数:14
相关论文
共 50 条
  • [21] Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs
    Png, Shaowei
    Pineau, Joelle
    Chaib-draa, Brahim
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2012, 6 (08) : 917 - 927
  • [22] Event-triggered load frequency control for multi-area power systems based on Markov model: a global sliding mode control approach
    Lv, Xinxin
    Sun, Yonghui
    Cao, Shiqi
    Dinavahi, Venkata
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2020, 14 (21) : 4878 - 4887
  • [23] The Scenario Approach Meets Uncertain Game Theory and Variational Inequalities
    Paccagnan, Dario
    Campi, Marco C.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 6124 - 6129
  • [24] The combined effect of delay and feedback on the insurance pricing process: a control theory approach
    Zimbidis, A
    Haberman, S
    INSURANCE MATHEMATICS & ECONOMICS, 2001, 28 (02) : 263 - 280
  • [25] On optimal control theory in marine oil spill management: A Markovian decision approach
    Bassey, K. J.
    Chigbu, P. E.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 217 (02) : 470 - 478
  • [26] Cybergenetics: Theory and Applications of Genetic Control Systems
    Khammash, Mustafa H.
    PROCEEDINGS OF THE IEEE, 2022, 110 (05) : 631 - 658
  • [27] Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
    Prashanth, L. A.
    Jie, Cheng
    Fu, Michael
    Marcus, Steve
    Szepesvari, Csaba
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [28] Editorial: Hybrid Intelligent Algorithms Based Learning, Optimization, and Application to Autonomic Control Systems
    Zhu, Yanzheng
    Lam, Hak-Keung
    Yang, Ting
    Zhong, Zhixiong
    Arik, Sabri
    FRONTIERS IN NEUROSCIENCE, 2019, 13
  • [29] Theory, algorithms and technology in the design of control systems
    Bars, Ruth
    Colaneri, Patrizio
    de Souza, Carlos E.
    Dugard, Luc
    Allgower, Frank
    Kleimenov, Anatolii
    Scherer, Carsten
    ANNUAL REVIEWS IN CONTROL, 2006, 30 (01) : 19 - 30
  • [30] A New Predictive Sliding Mode Control Approach for Networked Control Systems With Time Delay and Packet Dropout
    Zhang, Yu
    Xie, Shousheng
    Ren, Litong
    Zhang, Ledi
    IEEE ACCESS, 2019, 7 : 134280 - 134292