Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network

被引：34

作者：

Lin, Chien-Ming ^{[1
]}

Tsai, Chi-Yi ^{[1
]}

Lai, Yu-Cheng ^{[1
]}

Li, Shin-An ^{[1
]}

Wong, Ching-Chang ^{[1
]}

机构：

[1] Tamkang Univ, Dept Elect & Comp Engn, New Taipei 251, Taiwan

来源：

IEEE SENSORS JOURNAL | 2018年 / 18卷 / 22期

关键词：

Deep learning; convolution neural networks; semantic segmentation; pose estimation;

D O I：

10.1109/JSEN.2018.2870957

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The proposed visual control system consists of a visual perception module, an object pose estimation module, a data argumentation module, and a robot manipulator controller. The visual perception module combines deep convolution neural networks (CNNs) and a fully connected conditional random field layer to realize an image semantic segmentation function, which can provide stable and accurate object classification results in cluttered environments. The object paw estimation module implements a model-based pose estimation method to estimate the 3D pose of the target for picking control. In addition, the proposed data argumentation module automatically generates training data for training the deep CNN. Experimental results show that the proposed scene segmentation method used in the data argumentation module reaches a high accuracy rate of 97.10% on average, which is higher than other state-of-the-art segment methods. Moreover, with the proposed data argumentation module, the visual perception module reaches an accuracy rate over than 80% and 72% in the case of detecting and recognizing one object and three objects, respectively. In addition, the proposed model-based pose estimation method provides accurate 3D pose estimation results. The average translation and rotation errors in the three axes are all smaller than 0.52 cm and 3.95 degrees, respectively. These advantages make the proposed visual control system suitable for applications of random object picking and manipulation.

引用

页码：9370 / 9381

页数：12

共 29 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[2]

Buch AG, 2013, IEEE INT CONF ROBOT, P2080, DOI 10.1109/ICRA.2013.6630856

[3] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[4]

Choi C, 2012, IEEE INT CONF ROBOT, P1724, DOI 10.1109/ICRA.2012.6225371

[5] GAT: a Graphical Annotation Tool for semantic regions [J].

Giro-i-Nieto, Xavier ;

Camps, Neus ;

Marques, Ferran .

MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 46 (2-3) :155-174

[6]

Hernandez C, 2016, Robot World Cup, P613

[7]

Huang Y.-J., 2017, P INT AUT CONTR C PI, P1

[8]

Jonschkowski R, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P1, DOI 10.1109/IROS.2016.7758087

[9]

Joseph RK, 2016, CRIT POL ECON S ASIA, P1

[10]

Lai Y.-C., 2016, THESIS

← 1 2 3 →