An Acoustic Signal Based Language Independent Lip Synchronization Method and Its Implementation via Extended LPC

被引:0
作者
Cankurtaran, Halil Said [1 ]
Boyaci, Ali [2 ]
Yarkan, Serhan [1 ]
机构
[1] Istanbul Ticaret Univ, Elekt Elekt Muhendisligi Bolumu, Kucukyali E5 Kavsagi,Inonu Cad,4 Kucukyali, TR-34840 Istanbul, Turkey
[2] Istanbul Ticaret Univ, Bilgisayar Muhendisligi Bolumu, Kucukyali E5 Kavsagi,Inonu Cad,4 Kucukyali, TR-34840 Istanbul, Turkey
来源
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2020年
关键词
formant frequency; linear predictive coding; lip sync;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Processing human speech with the use of digital technologies leads to several important fields of research. Speech-to-text and lip-syncing are among the instances of relevant prominent research areas. In this regard, audio-visualization of acoustic signals, providing visual aid in real-time for disabled people, and realization of text-free animation applications are just to name a few. Therefore, in this study, a language-independent lip-sync method that is based on extended linear predictive coding is proposed. The proposed method operates on baseband electrical signal that is acquired by a standard single-channel off-the-shelf microphone and exploits the statistical characteristics of acoustic signals produced by human speech. In addition, the proposed method is implemented on an embedded system, tested, and its performance is evaluated. Results are given along with discussions and future directions.
引用
收藏
页数:4
相关论文
共 17 条
[1]   Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models [J].
Abdelaziz, Ahmed Hussen ;
Theobald, Barry-John ;
Binder, Justin ;
Fanelli, Gabriele ;
Dixon, Paul ;
Apostoloff, Nicholas ;
Weise, Thibaut ;
Kajareker, Sachin .
ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, :220-225
[2]   Audio-driven emotional speech animation for interactive virtual characters [J].
Charalambous, Constantinos ;
Yumak, Zerrin ;
van der Stappen, A. Frank .
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
[3]  
Cohen M. M., 1993, Models and Techniques in Computer Animation, P139
[4]   Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise [J].
Doire, Clement S. J. ;
Brookes, Mike ;
Naylor, Patrick A. ;
Hicks, Christopher M. ;
Betts, Dave ;
Dmour, Mohammad A. ;
Jensen, Soren Holdt .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) :572-587
[5]   Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering [J].
Dong, Huan-Yu ;
Lee, Chang-Myung .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[6]   JALI: An Animator-Centric Viseme Model for Expressive Lip Synchronization [J].
Edwards, Pif ;
Landreth, Chris ;
Fiume, Eugene ;
Singh, Karan .
ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04)
[7]  
Gaspar H., 2019, VIRTUAL REAL-LONDON
[8]   Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [J].
Karras, Tero ;
Aila, Timo ;
Laine, Samuli ;
Herva, Antti ;
Lehtinen, Jaakko .
ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04)
[9]   A Decision Tree Framework for Spatiotemporal Sequence Prediction [J].
Kim, Taehwan ;
Yue, Yisong ;
Taylor, Sarah ;
Matthews, Iain .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :577-586
[10]   Creating speech-synchronized animation [J].
King, SA ;
Parent, RE .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2005, 11 (03) :341-352