The Effect of Real-Time Constraints on Automatic Speech Animation

被引：4

作者：

Websdale, Danny ^{[1
]}

Taylor, Sarah ^{[1
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Norwich, Norfolk, England

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

英国工程与自然科学研究理事会;

关键词：

Real-time speech animation; automatic lip sync;

D O I：

10.21437/Interspeech.2018-2066

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning has previously been applied successfully to speech-driven facial animation. To account for carry-over and anticipatory coarticulation a common approach is to predict the facial pose using a symmetric window of acoustic speech that includes both past and future context. Using future context limits this approach for animating the faces of characters in real-time and networked applications, such as online gaming. An acceptable latency for conversational speech is 200ms and typically network transmission times will consume a significant part of this. Consequently, we consider asymmetric windows by investigating the extent to which decreasing the future context effects the quality of predicted animation using both deep neural networks (DNNs) and bi-directional LSTM recurrent neural networks (BiLSTMs). Specifically we investigate future contexts from 170ms (fully-symmetric) to 0ms (fully asymmetric). We find that a BiLSTM trained using 70ms of future context is able to predict facial motion of equivalent quality as a DNN trained with 170ms, while introducing increased processing time of only 5ms. Subjective tests using the BiLSTM show that reducing the future context from 170ms to 50ms does not significantly decrease perceived realism. Below 50ms, the perceived realism begins to deteriorate, generating a trade-off between realism and latency.

引用

页码：2479 / 2483

页数：5

共 50 条

[31] REAL-TIME ANIMATION OF CONSTRUCTION ACTIVITIES
CLEVELAND, AB
EXCELLENCE IN THE CONSTRUCTED PROJECT, 1989, : 238 - 243
[32] Real-Time Facial Character Animation
Tasli, H. Emrah
den Uyl, Tim M.
Boujut, Hugo
Zaharia, Titus
2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
[33] Real-time cartoon animation of smoke
He, HT
Xu, DQ
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2005, 16 (3-4) : 441 - 449
[34] FRAMEWORK DEVELOPMENT OF REAL-TIME LIP SYNC ANIMATION ON VISEME BASED HUMAN SPEECH
Hoon, Loh Ngiik
Rahman, Khairul Aidil Azlin Abd.
Chai, Wang Yin
JURNAL TEKNOLOGI, 2015, 75 (04): : 43 - 48
[35] Real-time speech-driven face animation with expressions using neural networks
Hong, PY
Wen, Z
Huang, TS
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04): : 916 - 927
[36] Real-Time Planner in the Operational Space for the Automatic Handling of Kinematic Constraints
Guarino Lo Bianco, Corrado
Ghilardelli, Fabio
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2014, 11 (03) : 730 - 739
[37] An Automatic Real-time Synchronization of Live Speech with Its Transcription Approach
Lertwongkhanakool, Nat
Kertkeidkachorn, Natthawut
Punyabukkana, Proadpran
Suchato, Atiwong
ENGINEERING JOURNAL-THAILAND, 2015, 19 (05): : 81 - 99
[38] On-the-fly Lattice Rescoring for Real-time Automatic Speech Recognition
Sak, Hasim
Saraclar, Murat
Gungor, Tunga
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2450 - +
[39] Automatic enforcement of constraints in real-time collaborative architectural decision making
Gaubatz, Patrick
Lytra, Ioanna
Zdun, Uwe
JOURNAL OF SYSTEMS AND SOFTWARE, 2015, 103 : 128 - 149
[40] REAL-TIME ANIMATION TURNS DATA INTO IMAGES
CARUSO, D
ELECTRONICS, 1985, 58 (33): : 20 - 21

← 1 2 3 4 5 →