Acoustic-to-Articulatory Inversion of a Three-dimensional Theoretical Vocal Tract Model Using Deep Learning-based Model

被引：0

作者：

Lapthawan, Thanat ^{[1
]}

Prom-on, Santitham ^{[1
]}

机构：

[1] King Mongkuts Univ Technol Thonburi, Dept Comp Engn, Bangkok, Thailand

来源：

2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019) | 2019年

关键词：

articulatory mapping; articulatory synthesis; vocal tract model; deep learning;

D O I：

10.1109/icawst.2019.8923588

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents an acoustic-to-articulatory mapping of a three-dimensional theoretical vocal tract model using deep learning methods. Prominent deep learning-based network structures are explored and evaluated for their suitability in capturing the relationship between acoustic and articulatory-oriented vocal tract parameters. The dataset was synthesized from VocalTractLab, a three-dimensional theoretical articulatory synthesizer, in forms of the pairs of acoustic, represented by Mel-frequency cepstral coefficients (MFCCs), and articulatory signals, represented by 23 vocal tract parameters. The sentence structure used in the dataset generation were both monosyllabic and disyllabic vowel articulations. Models were evaluated using the root-mean-square error (RMSE) and R-squared (R-2). The deep artificial neural network architecture (DNN), regulating using batch normalization, achieves the best performance for both inversion tasks, RMSE of 0.015 and R-2 of 0.970 for monosyllabic vowels and RMSE of 0.015and R-2 of 0.975 for disyllabic vowels. The comparison, between a formant of a sound from inverted articulatory parameters and the original synthesized sound, demonstrates that there is no statistically different between original and estimated parameters. The results indicate that the deep learning-based model is effectively estimated articulatory parameters in a three-dimensional space of a vocal tract model.

引用

页码：52 / 56

页数：5

共 50 条

[1] Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion
Cai, Shanqing
Bunnell, H. Timothy
Patel, Rupal
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1711 - 1715
[2] Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy
Sun, Yifan
Huang, Qinlong
Wu, Xihong
INTERSPEECH 2022, 2022, : 4656 - 4660
[3] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
Sivaraman, Ganesh
Mitra, Vikramjit
Nam, Hosung
Tiede, Mark
Espy-Wilson, Carol
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
[4] Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract
Csapo, Tamas Gabor
INTERSPEECH 2020, 2020, : 3720 - 3724
[5] On smoothing articulatory trajectories obtained from Gaussian mixture model based acoustic-to-articulatory inversion
Ghosh, Prasanta K.
Narayanan, Shrikanth S.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (02): : EL258 - EL264
[6] Acoustic-to-Articulatory Inversion Mapping based on Latent Trajectory Gaussian Mixture Model
Tobing, Patrick Lumban
Toda, Tomoki
Kameoka, Hirokazu
Nakamura, Satoshi
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 953 - 957
[7] A SPARSE SMOOTHING APPROACH FOR GAUSSIAN MIXTURE MODEL BASED ACOUSTIC-TO-ARTICULATORY INVERSION
Sudhakar, Prasad
Jacques, Laurent
Ghosh, Prasanta Kumar
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[8] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
Xie, Xurong
Liu, Xunying
Wang, Lan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
[9] REPRESENTATION LEARNING USING CONVOLUTION NEURAL NETWORK FOR ACOUSTIC-TO-ARTICULATORY INVERSION
Illa, Aravind
Ghosh, Prasanta Kumar
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5931 - 5935
[10] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
Hiroya, Sadao
Honda, Masaaki
IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1071 - 1078

← 1 2 3 4 5 →