Learning to View: Decision Transformers for Active Object Detection

被引:4
作者
Ding, Wenhao [1 ,2 ,3 ]
Majcherczyk, Nathalie [2 ]
Deshpande, Mohit [2 ]
Qi, Xuewei [2 ]
Zhao, Ding [3 ]
Madhivanan, Rajasimman [2 ]
Sen, Arnie [2 ]
机构
[1] Amazon, Sunnyvale, CA USA
[2] Amazon Lab126, Sunnyvale, CA 94098 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023) | 2023年
关键词
ALGORITHMS;
D O I
10.1109/ICRA48891.2023.10160946
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Active perception describes a broad class of techniques that couple planning and perception systems to move the robot in a way to give the robot more information about the environment. In most robotic systems, perception is typically independent of motion planning. For example, traditional object detection is passive: it operates only on the images it receives. However, we have a chance to improve the results if we allow planning to consume detection signals and move the robot to collect views that maximize the quality of the results. In this paper, we use reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality. Specifically, we propose using a Decision Transformer with online fine-tuning, which first optimizes the policy with a pre-collected expert dataset and then improves the learned policy by exploring better solutions in the environment. We evaluate the performance of proposed method on an interactive dataset collected from an indoor scenario simulator. Experimental results demonstrate that our method outperforms all baselines, including expert policy and pure offline RL methods. We also provide exhaustive analyses of the reward distribution and observation space.
引用
收藏
页码:7140 / 7146
页数:7
相关论文
共 37 条
[1]  
Agarwal R, 2020, PR MACH LEARN RES, V119
[2]  
Ammirato Phil, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1378, DOI 10.1109/ICRA.2017.7989164
[3]  
[Anonymous], 2017, P INT C NEURAL INFOR
[4]  
Boustati A, 2021, Arxiv, DOI arXiv:2110.14355
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]  
Chen LL, 2021, ADV NEUR IN, V34
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Ding WH, 2022, Arxiv, DOI arXiv:2207.09081
[9]   Learning to Collide: An Adaptive Safety-Critical Scenarios Generating Method [J].
Ding, Wenhao ;
Chen, Baiming ;
Xu, Minjun ;
Zhao, Ding .
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, :2243-2250
[10]   A Survey of Embodied AI: From Simulators to Research Tasks [J].
Duan, Jiafei ;
Yu, Samson ;
Tan, Hui Li ;
Zhu, Hongyuan ;
Tan, Cheston .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (02) :230-244