Pose-Driven Compression for Dynamic 3D Human via Human Prior Models

被引:0
作者
Yan, Ruoke [1 ]
Yin, Qian [2 ]
Zhang, Xinfeng [2 ]
Zhang, Qi
Zhang, Gai
Ma, Siwei
机构
[1] Peking Univ, Sch Comp Sci, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing 100049, Peoples R China
基金
国家重点研发计划;
关键词
Three-dimensional displays; Point cloud compression; Bit rate; Image coding; Dynamics; Solid modeling; Data models; Dynamic 3D human compression; human prior models; pose-driven representation;
D O I
10.1109/TPAMI.2024.3368567
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To cost-effectively transmit high-quality dynamic 3D human images in immersive multimedia applications, efficient data compression is crucial. Unlike existing methods that focus on reducing signal-level reconstruction errors, we propose the first dynamic 3D human compression framework based on human priors. The layered coding architecture significantly enhances the perceptual quality while also supporting a variety of downstream tasks, including visual analysis and content editing. Specifically, a high-fidelity pose-driven Avatar is generated from the original frames as the basic structure layer to implicitly represent the human shape. Then, human movements between frames are parameterized via a commonly-used human prior model, i.e., the Skinned Multi-Person Linear Model (SMPL), to form the motion layer and drive the Avatar. Furthermore, the normals are also introduced as an enhancement layer to preserve fine-grained geometric details. Finally, the Avatar, SMPL parameters, and normal maps are efficiently compressed into layered semantic bitstreams. Extensive qualitative and quantitative experiments show that the proposed framework remarkably outperforms other state-of-the-art 3D codecs in terms of subjective quality with only a few bits. More notably, as the size or frame number of the 3D human sequence increases, the superiority of our framework in perceptual quality becomes more significant while saving more bitrates.
引用
收藏
页码:5820 / 5834
页数:15
相关论文
共 19 条
  • [1] Inter-Frame Compression for Dynamic Point Cloud Geometry Coding
    Akhtar, Anique
    Li, Zhu
    van der Auwera, Geert
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 584 - 594
  • [2] Fan TY, 2022, PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, P898
  • [3] Google, Point cloud compression reference software
  • [4] OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression
    Huang, Lila
    Wang, Shenlong
    Wong, Kelvin
    Liu, Jerry
    Urtasun, Raquel
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1310 - 1320
  • [5] 3D Point Cloud Geometry Compression on Deep Learning
    Huang, Tianxin
    Liu, Yong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 890 - 898
  • [6] Introduction to the Special Issue on Recent Advances in Point Cloud Processing and Compression
    Li, Zhu
    Liu, Shan
    Dufaux, Frederic
    Li, Li
    Li, Ge
    Kuo, C-C Jay
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) : 4555 - 4560
  • [7] SMPL: A Skinned Multi-Person Linear Model
    Loper, Matthew
    Mahmood, Naureen
    Romero, Javier
    Pons-Moll, Gerard
    Black, Michael J.
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
  • [8] MPEG 3D Graphics and Haptics Coding Group, 2022, ISO/IEC JTC 1/SC 29/WG 7 MPEG output document N00360
  • [9] Lossless Coding of Point Cloud Geometry Using a Deep Generative Model
    Nguyen, Dat Thanh
    Quach, Maurice
    Valenzise, Giuseppe
    Duhamel, Pierre
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) : 4617 - 4629
  • [10] Quach M, 2019, IEEE IMAGE PROC, P4320, DOI [10.1109/ICIP.2019.8803413, 10.1109/icip.2019.8803413]