Initial State Interventions for Deconfounded Imitation Learning
被引:0
作者:
Pfrommer, Samuel
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
Pfrommer, Samuel
[1
]
Bai, Yatong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
Bai, Yatong
[1
]
Lee, Hyunin
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
Lee, Hyunin
[1
]
Sojoudi, Somayeh
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
Sojoudi, Somayeh
[1
]
机构:
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
来源:
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC
|
2023年
关键词:
D O I:
10.1109/CDC49753.2023.10383252
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Imitation learning suffers from causal confusion. This phenomenon occurs when learned policies attend to features that do not causally influence the expert actions but are instead spuriously correlated. Causally confused agents produce low open-loop supervised loss but poor closed-loop performance upon deployment. We consider the problem of masking observed confounders in a disentangled representation of the observation space. Our novel masking algorithm leverages the usual ability to intervene in the initial system state, avoiding any requirement involving expert querying, expert reward functions, or causal graph specification. Under certain assumptions, we theoretically prove that this algorithm is conservative in the sense that it does not incorrectly mask observations that causally influence the expert; furthermore, intervening on the initial state serves to strictly reduce excess conservatism. The masking algorithm is applied to behavior cloning for two illustrative control systems: CartPole and Reacher.
机构:
Southwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Zhang, Yu-Xuan
Yang, Mei
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Yang, Mei
Zhou, Zhengchun
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Zhou, Zhengchun
Min, Fan
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Southwest Petr Univ, Inst Artificial Intelligence, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
机构:
Robot Learning Laboratory, Max-Planck Institute for Biological Cybernetics (MPI)Robot Learning Laboratory, Max-Planck Institute for Biological Cybernetics (MPI)
Kober J.
Peters J.
论文数: 0引用数: 0
h-index: 0
机构:
Robot Learning Laboratory, MPIRobot Learning Laboratory, Max-Planck Institute for Biological Cybernetics (MPI)
机构:
Southwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Zhang, Yu-Xuan
Yang, Mei
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Yang, Mei
Zhou, Zhengchun
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Zhou, Zhengchun
Min, Fan
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
Southwest Petr Univ, Inst Artificial Intelligence, Chengdu, Peoples R ChinaSouthwest Petr Univ, Sch Comp Sci, Chengdu, Peoples R China
机构:
Robot Learning Laboratory, Max-Planck Institute for Biological Cybernetics (MPI)Robot Learning Laboratory, Max-Planck Institute for Biological Cybernetics (MPI)
Kober J.
Peters J.
论文数: 0引用数: 0
h-index: 0
机构:
Robot Learning Laboratory, MPIRobot Learning Laboratory, Max-Planck Institute for Biological Cybernetics (MPI)