HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images

被引:1
作者
Wang, Shuaibing [1 ,2 ]
Wang, Shunli [1 ,2 ]
Yang, Dingkang [1 ,2 ]
Li, Mingcheng [1 ,2 ]
Qian, Ziyun [1 ,2 ]
Su, Liuzhen [1 ,2 ]
Zhang, Lihua [1 ,2 ,3 ,4 ]
机构
[1] Fudan Univ, Acad Engn & Technol, Shanghai, Peoples R China
[2] IPASS, Inst Meta Med, Shanghai, Peoples R China
[3] Jilin Prov Key Lab Intelligence Sci & Engn, Changchun, Peoples R China
[4] AI & Unmanned Syst Engn Res Ctr Jilin Prov, Changchun, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
基金
国家重点研发计划;
关键词
3D hand mesh reconstruction; hand-object occlusion; computer vision;
D O I
10.1109/ICME55011.2023.00425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a robust and accurate method for reconstructing 3D hand mesh from monocular images. This is a very challenging problem, as hands are often severely occluded by objects. Previous works often have disregarded 2D hand pose information, which contains hand prior knowledge that is strongly correlated with occluded regions. Thus, in this work, we propose a novel 3D hand mesh reconstruction network HandGCAT, that can fully exploit hand prior as compensation information to enhance occluded region features. Specifically, we designed the Knowledge-Guided Graph Convolution (KGC) module and the Cross-Attention Transformer (CAT) module. KGC extracts hand prior information from 2D hand pose by graph convolution. CAT fuses hand prior into occluded regions by considering their high correlation. Extensive experiments on popular datasets with challenging hand-object occlusions, such as HO3D v2, HO3D v3, and DexYCB demonstrate that our HandGCAT reaches state-of-the-art performance. The code is available at https://github.com/heartStrive/HandGCAT.
引用
收藏
页码:2495 / 2500
页数:6
相关论文
共 39 条
  • [31] TSA-Net: Tube Self-Attention Network for Action Quality Assessment
    Wang, Shunli
    Yang, Dingkang
    Zhai, Peng
    Chen, Chixiao
    Zhang, Lihua
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4902 - 4910
  • [32] Disentangled Representation Learning for Multimodal Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Kuang, Haopeng
    Du, Yangtao
    Zhang, Lihua
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1642 - 1651
  • [33] Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences
    Yang, Dingkang
    Liu, Yang
    Huang, Can
    Li, Mingcheng
    Zhao, Xiao
    Wang, Yuzheng
    Yang, Kun
    Wang, Yan
    Zhai, Peng
    Zhang, Lihua
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 265
  • [34] Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences
    Yang, Dingkang
    Kuang, Haopeng
    Huang, Shuai
    Zhang, Lihua
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1708 - 1717
  • [35] Emotion Recognition for Multiple Context Awareness
    Yang, Dingkang
    Huang, Shuai
    Wang, Shunli
    Liu, Yang
    Zhai, Peng
    Su, Liuzhen
    Li, Mingcheng
    Zhang, Lihua
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 144 - 162
  • [36] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [37] Yang Dingkang, 2023, Context de-confounded emotion recognition
  • [38] Yang Lixin, 2022, P IEEE CVF C COMP VI
  • [39] Zhang Baowen, 2021, P IEEE CVF INT C COM