AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation

被引:14
|
作者
Ohkawa, Takehiko [1 ,2 ]
He, Kun [1 ]
Sener, Fadime [1 ]
Hodan, Tomas [1 ]
Tran, Luan [1 ]
Keskin, Cem [1 ]
机构
[1] Meta Real Labs, Tokyo, Japan
[2] Univ Tokyo, Tokyo, Japan
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.01249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities with challenging hand-object interactions. The dataset includes synchronized egocentric and exocentric images sampled from the recent Assembly101 dataset, in which participants assemble and disassemble take-apart toys. To obtain high-quality 3D hand pose annotations for the egocentric images, we develop an efficient pipeline, where we use an initial set of manual annotations to train a model to automatically annotate a much larger dataset. Our annotation model uses multi-view feature fusion and an iterative refinement scheme, and achieves an average keypoint error of 4.20 mm, which is 85% lower than the error of the original annotations in Assembly101. AssemblyHands provides 3.0M annotated images, including 490K egocentric images, making it the largest existing benchmark dataset for egocentric 3D hand pose estimation. Using this data, we develop a strong single-view baseline of 3D hand pose estimation from egocentric images. Furthermore, we design a novel action classification task to evaluate predicted 3D hand poses. Our study shows that having higher-quality hand poses directly improves the ability to recognize actions.
引用
收藏
页码:12999 / 13008
页数:10
相关论文
共 50 条
  • [1] 3D Hand Pose Estimation in Everyday Egocentric Images
    Prakash, Aditya
    Tu, Ruisen
    Chang, Matthew
    Gupta, Saurabh
    COMPUTER VISION - ECCV 2024, PT LXXVIII, 2025, 15136 : 183 - 202
  • [2] Hand PointNet-based 3D Hand Pose Estimation in Egocentric RGB-D Images
    Le, Van-Hung
    Hoang, Van-Nam
    Vu, Hai
    Le, Thi-Lan
    Tran, Thanh-Hai
    Vu, Viet-Vu
    PROCEEDINGS OF 202013TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC 2020), 2020, : 215 - 220
  • [3] 3D Hand Pose Detection in Egocentric RGB-D Images
    Rogez, Gregory
    Khademi, Maryam
    Supancic, J. S., III
    Montiel, J. M. M.
    Ramanan, Deva
    COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 356 - 371
  • [4] 3D Hand Pose Estimation via Graph-Based Reasoning
    Song, Jae-Hun
    Kang, Suk-Ju
    IEEE ACCESS, 2021, 9 : 35824 - 35833
  • [5] 3D Human Pose Estimation Using Egocentric Depth Data
    Baek, Seongmin
    Gil, Youn-Hee
    Kim, Yejin
    30TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY, VRST 2024, 2024,
  • [6] Enhancing egocentric 3D pose estimation with third person views
    Dhamanaskar, Ameya
    Dimiccoli, Mariella
    Corona, Enric
    Pumarola, Albert
    Moreno-Noguer, Francesc
    PATTERN RECOGNITION, 2023, 138
  • [7] Scene-aware Egocentric 3D Human Pose Estimation
    Wang, Jian
    Luvizon, Diogo
    Xu, Weipeng
    Liu, Lingjie
    Sarkar, Kripasindhu
    Theobalt, Christian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13031 - 13040
  • [8] Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos
    Wen, Yilin
    Pan, Hao
    Yang, Lei
    Pan, Jia
    Komura, Taku
    Wang, Wenping
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21243 - 21253
  • [9] Dense 3D Regression for Hand Pose Estimation
    Wan, Chengde
    Probst, Thomas
    Van Gool, Luc
    Yao, Angela
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5147 - 5156
  • [10] Temporal Hints in 3D Hand Pose Estimation
    Yu, Taidong
    Cao, Zhiguo
    Xiao, Yang
    Zhang, Boshen
    Zhu, Zihao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 2042 - 2047