Acoustic-to-Articulatory Inversion of a Three-dimensional Theoretical Vocal Tract Model Using Deep Learning-based Model

被引:0
|
作者
Lapthawan, Thanat [1 ]
Prom-on, Santitham [1 ]
机构
[1] King Mongkuts Univ Technol Thonburi, Dept Comp Engn, Bangkok, Thailand
来源
2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019) | 2019年
关键词
articulatory mapping; articulatory synthesis; vocal tract model; deep learning;
D O I
10.1109/icawst.2019.8923588
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents an acoustic-to-articulatory mapping of a three-dimensional theoretical vocal tract model using deep learning methods. Prominent deep learning-based network structures are explored and evaluated for their suitability in capturing the relationship between acoustic and articulatory-oriented vocal tract parameters. The dataset was synthesized from VocalTractLab, a three-dimensional theoretical articulatory synthesizer, in forms of the pairs of acoustic, represented by Mel-frequency cepstral coefficients (MFCCs), and articulatory signals, represented by 23 vocal tract parameters. The sentence structure used in the dataset generation were both monosyllabic and disyllabic vowel articulations. Models were evaluated using the root-mean-square error (RMSE) and R-squared (R-2). The deep artificial neural network architecture (DNN), regulating using batch normalization, achieves the best performance for both inversion tasks, RMSE of 0.015 and R-2 of 0.970 for monosyllabic vowels and RMSE of 0.015and R-2 of 0.975 for disyllabic vowels. The comparison, between a formant of a sound from inverted articulatory parameters and the original synthesized sound, demonstrates that there is no statistically different between original and estimated parameters. The results indicate that the deep learning-based model is effectively estimated articulatory parameters in a three-dimensional space of a vocal tract model.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [1] Unsupervised Vocal-tract Length Estimation Through Model-based Acoustic-to-Articulatory Inversion
    Cai, Shanqing
    Bunnell, H. Timothy
    Patel, Rupal
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1711 - 1715
  • [2] Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy
    Sun, Yifan
    Huang, Qinlong
    Wu, Xihong
    INTERSPEECH 2022, 2022, : 4656 - 4660
  • [3] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
  • [4] Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract
    Csapo, Tamas Gabor
    INTERSPEECH 2020, 2020, : 3720 - 3724
  • [5] On smoothing articulatory trajectories obtained from Gaussian mixture model based acoustic-to-articulatory inversion
    Ghosh, Prasanta K.
    Narayanan, Shrikanth S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (02): : EL258 - EL264
  • [6] Acoustic-to-Articulatory Inversion Mapping based on Latent Trajectory Gaussian Mixture Model
    Tobing, Patrick Lumban
    Toda, Tomoki
    Kameoka, Hirokazu
    Nakamura, Satoshi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 953 - 957
  • [7] A SPARSE SMOOTHING APPROACH FOR GAUSSIAN MIXTURE MODEL BASED ACOUSTIC-TO-ARTICULATORY INVERSION
    Sudhakar, Prasad
    Jacques, Laurent
    Ghosh, Prasanta Kumar
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [8] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
    Xie, Xurong
    Liu, Xunying
    Wang, Lan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
  • [9] REPRESENTATION LEARNING USING CONVOLUTION NEURAL NETWORK FOR ACOUSTIC-TO-ARTICULATORY INVERSION
    Illa, Aravind
    Ghosh, Prasanta Kumar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5931 - 5935
  • [10] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, Sadao
    Honda, Masaaki
    IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1071 - 1078