Joint object recognition and pose estimation using multiple-anchor triplet learning of canonical plane

被引：1

作者：

Yoneda, Shunsuke ^{[1
]}

Ueno, Kouki ^{[1
]}

Irie, Go ^{[2
]}

Nishiyama, Masashi ^{[1
]}

Iwai, Yoshio ^{[1
]}

机构：

[1] Tottori Univ, Grad Sch Engn, 101 Minami 4 Chome,Koyama Cho, Tottori 6808550, Japan

[2] NTT Corp, 1 Morinosato Wakamiya 3 Chome, Atsugi, Kanagawa 2430198, Japan

来源：

PATTERN RECOGNITION LETTERS | 2021年 / 152卷

关键词：

Object recognition; Pose estimation; Multiple anchor; SIMILARITY;

D O I：

10.1016/j.patrec.2021.11.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Accurate object recognition and pose estimation models are essential for practical applications of robot arms, such as picking products on a shelf. Training such a model often requires a large-scale dataset with qualified labels for both object classes and pose parameters, and collecting accurate pose labels is particularly costly. A recent paper Ueno et al.(2019)[28] proposed a triplet learning framework for joint object recognition and pose estimation without explicit pose labels by learning a spatial transformer network to estimate the pose difference of an input image from an anchor image depicting the same object in a reference pose. However, our analysis suggests that the pose estimation accuracy is severely degraded for input images with large pose differences. To address this problem, we propose a new learning approach called multiple-anchor triplet learning. The basic idea is to give dense reference poses by preparing multiple anchors so that there is at least one anchor image having a small pose difference to the input image. Our multiple-anchor triplet learning is an extension of the standard single-anchor triplet learning to the multiple-anchor case. Inspired by the idea of multiple instance learning, we introduce a selection layer that automatically chooses the best anchor for each input image and allows the network to be trained end-to-end to minimize triplet-based losses. Experiments with three benchmark datasets in product picking scenarios demonstrate that our method significantly outperforms existing methods in both object recognition and pose estimation accuracy. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：372 / 381

页数：10

共 32 条

[1] Ahmed E, 2015, PROC CVPR IEEE, P3908, DOI 10.1109/CVPR.2015.7299016
[2] Andrews Stuart, 2002, NIPS
[3] Araki R., 2018, P LAT BREAK RES POST
[4] Pose Guided RGBD Feature Learning for 3D Object Pose Estimation
Balntas, Vassileios
Doumanoglou, Andreas
Sahin, Caner
Sock, Juil
Kouskouridas, Rigas
Kim, Tae-Kyun
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3876 - 3884
[5] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[6] Bui M, 2018, IEEE INT CONF ROBOT, P6140, DOI 10.1109/ICRA.2018.8460654
[7] Calli B, 2015, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), P510, DOI 10.1109/ICAR.2015.7251504
[8] On visual similarity based 3D model retrieval
Chen, DY
Tian, XP
Shen, YT
Ming, OY
[J]. COMPUTER GRAPHICS FORUM, 2003, 22 (03) : 223 - 232
[9] Chen T, 2020, PR MACH LEARN RES, V119
[10] Beyond triplet loss: a deep quadruplet network for person re-identification
Chen, Weihua
Chen, Xiaotang
Zhang, Jianguo
Huang, Kaiqi
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1320 - 1329

← 1 2 3 4 →