Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

被引:9
作者
Suo, Yucheng [1 ]
Zheng, Zhedong [2 ,3 ]
Wang, Xiaohan [1 ]
Zhang, Bang [4 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, 38 Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
[2] Univ Macau, Fac Sci & Technol, Taipa Univ Blvd, Macau 999078, Peoples R China
[3] Univ Macau, Inst Collaborat Innovat, Taipa Univ Blvd, Macau 999078, Peoples R China
[4] Alibaba Grp, DAMO Acad, 969 Wenyi West Rd, Hangzhou 311121, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Sign language; motion transfer; video generation; jointly training; HUMAN POSE ESTIMATION;
D O I
10.1145/3648368
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sign language provides a way for differently-abled individuals to express their feelings and emotions. However, learning sign language can be challenging and time consuming. An alternative approach is to animate user photos using sign language videos of specific words, which can be achieved using existing image animation methods. However, the finger motions in the generated videos are often not ideal. To address this issue, we propose the Structure-aware Temporal Consistency Network (STCNet), which jointly optimizes the prior structure of humans with temporal consistency to produce sign language videos. We use a fine-grained skeleton detector to acquire knowledge of body structure and introduce both short- and long-term cycle loss to ensure the continuity of the generated video. The two losses and keypoint detector network are optimized in an end-to-end manner. Quantitative and qualitative evaluations on three widely used datasets, namely LSA64, Phoenix-2014T, and WLASL-2000, demonstrate the effectiveness of the proposed method. It is our hope that this work can contribute to future studies on sign language production.
引用
收藏
页数:18
相关论文
共 97 条
[21]   Implicit Diffusion Models for Continuous Super-Resolution [J].
Gao, Sicheng ;
Liu, Xuhui ;
Zeng, Bohan ;
Xu, Sheng ;
Li, Yanjing ;
Luo, Xiaoyan ;
Liu, Jianzhuang ;
Zhen, Xiantong ;
Zhang, Baochang .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10021-10030
[22]   Mutual Support of Data Modalities in the Task of Sign Language Recognition [J].
Gruber, Ivan ;
Krnoul, Zdenek ;
Hruz, Marek ;
Kanis, Jakub ;
Bohacek, Matyas .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, :3419-3428
[23]  
Guo D, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P744
[24]   VITON: An Image-based Virtual Try-on Network [J].
Han, Xintong ;
Wu, Zuxuan ;
Wu, Zhe ;
Yu, Ruichi ;
Davis, Larry S. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7543-7552
[25]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[26]  
Ho J., 2020, ADV NEURAL INFORM PR, P6840
[27]   Sketch-guided Deep Portrait Generation [J].
Ho, Trang-Thi ;
Virtusio, John Jethro ;
Chen, Yung-Yao ;
Hsu, Chih-Ming ;
Hua, Kai-Lung .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
[28]   SPG-VTON: Semantic Prediction Guidance for Multi-Pose Virtual Try-on [J].
Hu, Bingwen ;
Liu, Ping ;
Zheng, Zhedong ;
Ren, Mingwu .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :1233-1246
[29]   SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition [J].
Hu, Hezhen ;
Zhao, Weichao ;
Zhou, Wengang ;
Wang, Yuechen ;
Li, Houqiang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11067-11076
[30]   Global-Local Enhancement Network for NMF-Aware Sign Language Recognition [J].
Hu, Hezhen ;
Zhou, Wengang ;
Pu, Junfu ;
Li, Houqiang .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (03)