Online estimation of objective function for continuous-time deterministic systems

被引:2
作者
Asl, Hamed Jabbari [1 ]
Uchibe, Eiji [1 ]
机构
[1] ATR Computat Neurosci Labs, Dept Brain Robot Interface, 2-2-2 Hikaridai,Seikacho, Soraku gun, Kyoto 6190288, Japan
关键词
Objective function estimation; Deterministic systems; Data-driven solution; Continuous-time systems;
D O I
10.1016/j.neunet.2024.106116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We developed two online data -driven methods for estimating an objective function in continuous -time linear and nonlinear deterministic systems. The primary focus addressed the challenge posed by unknown input dynamics (control mapping function) in the expert system, a critical element for an online solution of the problem. Our methods leverage both the learner's and expert's data for effective problem -solving. The first approach, which is model -free, estimates the expert's policy and integrates it into the learner agent to approximate the objective function associated with the optimal policy. The second approach estimates the input dynamics from the learner's data and combines it with the expert's input -state observations to tackle the objective function estimation problem. Compared to other methods for deterministic systems that rely on both the learner's and expert's data, our approaches offer reduced complexity by eliminating the need to estimate an optimal policy after each objective function update. We conduct a convergence analysis of the estimation techniques using Lyapunov-based methods. Numerical experiments validate the effectiveness of our developed methods.
引用
收藏
页数:11
相关论文
共 37 条
  • [31] Reward is enough
    Silver, David
    Singh, Satinder
    Precup, Doina
    Sutton, Richard S.
    [J]. ARTIFICIAL INTELLIGENCE, 2021, 299
  • [32] ON THE ROBUST-CONTROL OF ROBOT MANIPULATORS
    SPONG, MW
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1992, 37 (11) : 1782 - 1786
  • [33] Forward and inverse reinforcement learning sharing network weights and hyperparameters
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL NETWORKS, 2021, 144 : 138 - 153
  • [34] Vrabie Draguna., 2012, Optimal adaptive control and differential games by reinforcement learning principles
  • [35] Xue W., 2021, IEEE Transactions on Neural Networks and Learning Systems
  • [36] Inverse Reinforcement Learning in Tracking Control Based on Inverse Optimal Control
    Xue, Wenqian
    Kolaric, Patrik
    Fan, Jialu
    Lian, Bosen
    Chai, Tianyou
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10570 - 10581
  • [37] Identification of animal behavioral strategies by inverse reinforcement learning
    Yamaguchi, Shoichiro
    Naoki, Honda
    Ikeda, Muneki
    Tsukada, Yuki
    Nakano, Shunji
    Mori, Ikue
    Ishii, Shin
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (05):