Joint object recognition and pose estimation using multiple-anchor triplet learning of canonical plane

被引:1
作者
Yoneda, Shunsuke [1 ]
Ueno, Kouki [1 ]
Irie, Go [2 ]
Nishiyama, Masashi [1 ]
Iwai, Yoshio [1 ]
机构
[1] Tottori Univ, Grad Sch Engn, 101 Minami 4 Chome,Koyama Cho, Tottori 6808550, Japan
[2] NTT Corp, 1 Morinosato Wakamiya 3 Chome, Atsugi, Kanagawa 2430198, Japan
关键词
Object recognition; Pose estimation; Multiple anchor; SIMILARITY;
D O I
10.1016/j.patrec.2021.11.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accurate object recognition and pose estimation models are essential for practical applications of robot arms, such as picking products on a shelf. Training such a model often requires a large-scale dataset with qualified labels for both object classes and pose parameters, and collecting accurate pose labels is particularly costly. A recent paper Ueno et al.(2019)[28] proposed a triplet learning framework for joint object recognition and pose estimation without explicit pose labels by learning a spatial transformer network to estimate the pose difference of an input image from an anchor image depicting the same object in a reference pose. However, our analysis suggests that the pose estimation accuracy is severely degraded for input images with large pose differences. To address this problem, we propose a new learning approach called multiple-anchor triplet learning. The basic idea is to give dense reference poses by preparing multiple anchors so that there is at least one anchor image having a small pose difference to the input image. Our multiple-anchor triplet learning is an extension of the standard single-anchor triplet learning to the multiple-anchor case. Inspired by the idea of multiple instance learning, we introduce a selection layer that automatically chooses the best anchor for each input image and allows the network to be trained end-to-end to minimize triplet-based losses. Experiments with three benchmark datasets in product picking scenarios demonstrate that our method significantly outperforms existing methods in both object recognition and pose estimation accuracy. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:372 / 381
页数:10
相关论文
共 32 条
  • [1] Ahmed E, 2015, PROC CVPR IEEE, P3908, DOI 10.1109/CVPR.2015.7299016
  • [2] Andrews Stuart, 2002, NIPS
  • [3] Araki R., 2018, P LAT BREAK RES POST
  • [4] Pose Guided RGBD Feature Learning for 3D Object Pose Estimation
    Balntas, Vassileios
    Doumanoglou, Andreas
    Sahin, Caner
    Sock, Juil
    Kouskouridas, Rigas
    Kim, Tae-Kyun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3876 - 3884
  • [5] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
  • [6] Bui M, 2018, IEEE INT CONF ROBOT, P6140, DOI 10.1109/ICRA.2018.8460654
  • [7] Calli B, 2015, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), P510, DOI 10.1109/ICAR.2015.7251504
  • [8] On visual similarity based 3D model retrieval
    Chen, DY
    Tian, XP
    Shen, YT
    Ming, OY
    [J]. COMPUTER GRAPHICS FORUM, 2003, 22 (03) : 223 - 232
  • [9] Chen T, 2020, PR MACH LEARN RES, V119
  • [10] Beyond triplet loss: a deep quadruplet network for person re-identification
    Chen, Weihua
    Chen, Xiaotang
    Zhang, Jianguo
    Huang, Kaiqi
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1320 - 1329