Active Object Detection With Multistep Action Prediction Using Deep Q-Network

被引：60

作者：

Han, Xiaoning ^{[1
,2
,3
]}

Liu, Huaping ^{[4
,5
]}

Sun, Fuchun ^{[4
,5
]}

Zhang, Xinyu ^{[6
]}

机构：

[1] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Liaoning, Peoples R China

[2] Univ Chinese Acad Sci, Shenyang 110016, Liaoning, Peoples R China

[3] Chinese Acad Sci, Inst Robot & Intellgent Mfg, Shenyang 110016, Liaoning, Peoples R China

[4] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[5] Beijing Natl Res Ctr Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

[6] Tsinghua Univ, State Key Lab Automot Safety & Energy, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2019年 / 15卷 / 06期

基金：

美国国家科学基金会;

关键词：

Active object detection; active vision; deep Q-learning network (DQN); dueling architecture; reinforcement learning; RECOGNITION;

D O I：

10.1109/TII.2019.2890849

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, great success has been achieved in visual object detection, which is one of the fundamental tasks in the field of industrial intelligence. Most of existing methods have been proposed to deal with single well-captured still images, while in practical robotic applications, due to nuisances, such as tiny scale, partial view, or occlusion, one still image may not contain enough information for object detection. However, an intelligent robot has the capability to adjust its viewpoint to get better images for detection. Therefore, active object detection becomes a very important perception strategy for intelligent robots. In this paper, by formulating active object detection as a sequential action decision process, a deep reinforcement learning framework is established to resolve it. Furthermore, a novel deep Q-learning network (DQN) with a dueling architecture is proposed, the network has two separate output channels, one predicts action type and the other predicts action range. By combining the two output channels, the action space is explored more efficiently. Several methods are extensively validated and the results show that the proposed one obtains the best results and predicts action in real time.

引用

页码：3723 / 3731

页数：9

共 32 条

[1] Measuring the Objectness of Image Windows [J].

Alexe, Bogdan ;

Deselaers, Thomas ;

Ferrari, Vittorio .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2189-2202

[2]

Ammirato Phil, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1378, DOI 10.1109/ICRA.2017.7989164

[3]

[Anonymous], ABS150708085 CORR

[4]

[Anonymous], 2015, ABS151106581 CORR

[5]

[Anonymous], ADV NEURAL INFORM PR, DOI DOI 10.1109/TPAMI.2016.2577031

[6]

[Anonymous], ABS170905862 CORR

[7]

[Anonymous], P IEEE C COMP VIS PA

[8]

[Anonymous], IEEE T IND INFORM

[9] BING: Binarized Normed Gradients for Objectness Estimation at 300fps [J].

Cheng, Ming-Ming ;

Zhang, Ziming ;

Lin, Wen-Yan ;

Torr, Philip .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3286-3293

[10] Information theoretic sensor data selection for active object recognition and, state estimation [J].

Denzler, J ;

Brown, CM .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (02) :145-157

← 1 2 3 4 →