The Effect of Real-Time Constraints on Automatic Speech Animation

被引:4
|
作者
Websdale, Danny [1 ]
Taylor, Sarah [1 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Norwich, Norfolk, England
基金
英国工程与自然科学研究理事会;
关键词
Real-time speech animation; automatic lip sync;
D O I
10.21437/Interspeech.2018-2066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning has previously been applied successfully to speech-driven facial animation. To account for carry-over and anticipatory coarticulation a common approach is to predict the facial pose using a symmetric window of acoustic speech that includes both past and future context. Using future context limits this approach for animating the faces of characters in real-time and networked applications, such as online gaming. An acceptable latency for conversational speech is 200ms and typically network transmission times will consume a significant part of this. Consequently, we consider asymmetric windows by investigating the extent to which decreasing the future context effects the quality of predicted animation using both deep neural networks (DNNs) and bi-directional LSTM recurrent neural networks (BiLSTMs). Specifically we investigate future contexts from 170ms (fully-symmetric) to 0ms (fully asymmetric). We find that a BiLSTM trained using 70ms of future context is able to predict facial motion of equivalent quality as a DNN trained with 170ms, while introducing increased processing time of only 5ms. Subjective tests using the BiLSTM show that reducing the future context from 170ms to 50ms does not significantly decrease perceived realism. Below 50ms, the perceived realism begins to deteriorate, generating a trade-off between realism and latency.
引用
收藏
页码:2479 / 2483
页数:5
相关论文
共 50 条
  • [1] Real-Time Speech Driven Gesture Animation
    Kasarci, Kenan
    Bozkurt, Elif
    Yemez, Yucel
    Erzin, Engin
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1917 - 1920
  • [2] Automatic face cloning and animation - Using real-time facial feature tracking and speech acquisition
    Goto, T
    Kshirsagar, S
    Magnenat-Thalmann, N
    IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (03) : 17 - 25
  • [3] SYNTHESIZING REAL-TIME SPEECH-DRIVEN FACIAL ANIMATION
    Luo, Changwei
    Yu, Jun
    Wang, Zengfu
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] REAL-TIME ANIMATION
    JAMES, R
    DR DOBBS JOURNAL, 1990, 15 (01): : 16 - &
  • [5] Real-time speech-driven animation of expressive talking faces
    Liu, Jia
    You, Mingyu
    Chen, Chun
    Song, Mingli
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2011, 40 (04) : 439 - 455
  • [6] Learning Continuous Facial Actions From Speech for Real-Time Animation
    Pham, Hai X.
    Wang, Yuting
    Pavlovic, Vladimir
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1567 - 1580
  • [7] Audio based real-time speech animation of embodied conversational agents
    Malcangi, M
    de Tintis, R
    GESTURE-BASED COMMUNICATION IN HUMAN-COMPUTER INTERACTION, 2003, 2915 : 350 - 360
  • [8] Real-time animation of underbrush
    Endo, LCY
    Morimoto, CH
    Fabris, AE
    WSCG 2003 SHORT PAPERS, PROCEEDINGS, 2003, : 41 - 48
  • [9] Real-time light animation
    Sbert, M
    Szécsi, L
    Szirmay-Kalos, L
    COMPUTER GRAPHICS FORUM, 2004, 23 (03) : 291 - 299
  • [10] Advanced Real-time Animation
    Leeney, Mark
    Maloney, Darragh
    CGAMES'2006: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER GAMES: ARTIFICIAL INTELLIGENCE AND MOBILE SYSTEMS, 2006, : 151 - 156