Deep active object recognition by joint label and action prediction

被引：13

作者：

Malmir, Mohsen ^{[1
]}

Sikka, Karan ^{[2
]}

Forster, Deborah ^{[3
]}

Fasel, Ian ^{[4
]}

Movellan, Javier R. ^{[4
]}

Cottrell, Garrison W. ^{[1
]}

机构：

[1] Univ Calif San Diego, Comp Sci & Engn Dept, 9500 Gilman dr, San Diego, CA 92093 USA

[2] Univ Calif San Diego, Elect & Comp Engn Dept, 9500 Gilman dr, San Diego, CA 92093 USA

[3] Univ Calif San Diego, Qualcomm Inst, 9500 Gilman dr, San Diego, CA 92093 USA

[4] Emotient com, 4435 Eastgate Mall,Suite 320, San Diego, CA 92121 USA

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2017年 / 156卷

关键词：

Active object recognition; Deep learning; Q-learning;

D O I：

10.1016/j.cviu.2016.10.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An active object recognition system has the advantage of acting in the environment to capture images that are more suited for training and lead to better performance at test time. In this paper, we utilize deep convolutional neural networks for active object recognition by simultaneously predicting the object label and the next action to be performed on the object with the aim of improving recognition performance. We treat active object recognition as a reinforcement learning problem and derive the cost function to train the network for joint prediction of the object label and the action. A generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system. The training is carried out by simultaneously minimizing the label and action prediction errors using gradient descent. We empirically show that the proposed network is able to predict both the object label and the actions on GERMS, a dataset for active object recognition. We compare the test label prediction accuracy of the proposed model with Dirichlet and Naive Bayes state encoding. The results of experiments suggest that the proposed model equipped with Dirichlet state encoding is superior in performance, and selects images that lead to better training and higher accuracy of label prediction at test time. (C) 2016 Elsevier Inc. All rights reserved.

引用

页码：128 / 137

页数：10

共 16 条

[1]

ALOIMONOS J, 1987, INT J COMPUT VISION, V1, P333

[2]

[Anonymous], COMP VIS PATT REC 19

[3]

[Anonymous], WORKSH 25 AAAI C ART

[4]

[Anonymous], P HRI 2 WORKSH APPL

[5]

[Anonymous], TECH REP

[6]

[Anonymous], P BRIT MACH VIS C BM

[7] ACTIVE PERCEPTION [J].

BAJCSY, R .

PROCEEDINGS OF THE IEEE, 1988, 76 (08) :996-1005

[8] Appearance-based active object recognition [J].

Borotschnig, H ;

Paletta, L ;

Prantl, M ;

Pinz, A .

IMAGE AND VISION COMPUTING, 2000, 18 (09) :715-727

[9] Active In-Hand Object Recognition on a Humanoid Robot [J].

Browatzki, Bjoern ;

Tikhanoff, Vadim ;

Metta, Giorgio ;

Buelthoff, Heinrich H. ;

Wallraven, Christian .

IEEE TRANSACTIONS ON ROBOTICS, 2014, 30 (05) :1260-1269

[10]

Chen-Yu L., 2014, CoRR, V3, P93

← 1 2 →