Simultaneous Geometry and Pose Estimation of Held Objects Via 3D Foundation Models

被引:0
作者
Zhi, Weiming [1 ]
Tang, Haozhan [1 ]
Zhang, Tianyi [1 ]
Johnson-Roberson, Matthew [1 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 12期
关键词
Robots; Robot kinematics; Cameras; Three-dimensional displays; Image reconstruction; Geometry; Solid modeling; Robot vision systems; Pose estimation; Grippers; Robot manipulators; robot learning; computer vision; CALIBRATION;
D O I
10.1109/LRA.2024.3501677
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Humans have the remarkable ability to use held objects as tools to interact with their environment. Humans internally estimate how hand movements affect the object's movement. We wish to endow robots with this capability. We contribute methodology to jointly estimate the geometry and pose of objects grasped by a robot, from RGB images captured by an external camera. Notably, our method transforms the estimated geometry into the robot's coordinate frame, while not requiring the extrinsic parameters of the external camera to be calibrated. Our approach leverages 3D foundation models, large models pre-trained on huge datasets for 3D vision tasks, to produce initial estimates of the in-hand object. These initial estimations do not have physically correct scales and are in the camera's frame. Then, we formulate, and efficiently solve, a coordinate-alignment problem to recover accurate scales, along with a transformation of the objects to the coordinate frame of the robot. Forward kinematics mappings can subsequently be defined from the manipulator's joint angles to specified points on the object. These mappings enable the estimation of points on the held object at arbitrary configurations, enabling robot motion to be designed with respect to coordinates on the grasped objects. We empirically evaluate our approach on a manipulator holding a diverse set of real-world objects.
引用
收藏
页码:11818 / 11825
页数:8
相关论文
共 41 条
  • [1] Bommasani R., Et al., On the opportunities and risks of foundation models, (2021)
  • [2] Firoozi R., Et al., Foundationmodels in robotics:Applications, challenges, and the future, (2023)
  • [3] Wang S., Leroy V., Cabon Y., Chidlovskii B., Revaud J., DUSt3R: Geometric 3D vision made easy, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., (2023)
  • [4] Roberts L.G., Machine perception of three-dimensional solids, (1963)
  • [5] He Y., Sun W., Huang H., Liu J., Fan H., Sun J., PVN3D:Adeep pointwise 3D keypoints voting network for 6DoF pose estimation, Proc. 2020 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 11629-11638, (2019)
  • [6] Labb'E Y., Carpentier J., Aubry M., Sivic J., Cosypose: Consistent multi-view multi-object 6D pose estimation, Proc. Eur. Conf. Comput. Vis., pp. 574-591, (2020)
  • [7] Wang H., Sridhar S., Huang J., Valentin J.P.C., Song S., Guibas L.J., Normalized object coordinate space for category-level 6Dobject pose and size estimation, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 2642-2651, (2019)
  • [8] Labbe Y., Et al., MegaPose: 6D pose estimation of novel objects via render & compare, Proc. 6th Conf. Robot Learn., pp. 715-725, (2022)
  • [9] Ornek E.P., Et al., Foundpose: Unseen object pose estimation with foundation features, Proc. Eur. Conf. Comput. Vis., (2024)
  • [10] Mildenhall B., Srinivasan P.P., Tancik M., Barron J.T., Ramamoorthi R., Ng R., NeRF: Representing scenes as neural radiance fields for view synthesis, Proc. Eur. Conf. Comput. Vis., pp. 405-421, (2020)