Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

被引:0
|
作者
Yu, Yide [1 ]
Shandiz, Amin Honarmandi [1 ]
Toth, Laszlo [1 ]
机构
[1] Univ Szeged, Inst Informat, Szeged, Hungary
来源
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021) | 2021年
关键词
Real-Time MRI; articulatory-to-acoustic mapping; deep learning; RECOGNITION; ARTICULOGRAPHY; SYSTEM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using several objective speech quality metrics like the mean cepstral distortion (MCD), Short-Time Objective Intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Signal-to-Distortion Ratio (SDR). The results indicate that our approach can successfully reconstruct the gross spectral shape, but more improvements are needed to reproduce the fine spectral details.
引用
收藏
页码:945 / 949
页数:5
相关论文
共 50 条
  • [1] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
    Otani, Yuto
    Sawada, Shun
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2023, 2023, : 127 - 131
  • [2] A Multimodal Real-Time MRI Articulatory Corpus for Speech Research
    Narayanan, Shrikanth
    Bresch, Erik
    Ghosh, Prasanta
    Goldstein, Louis
    Katsamanis, Athanasios
    Kim, Yoon
    Lammert, Adam
    Proctor, Michael
    Ramanarayanan, Vikram
    Zhu, Yinghua
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 844 - +
  • [3] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
    Benitez, Andres
    Ramanarayanan, Vikram
    Goldstein, Louis
    Narayanan, Shrikanth
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705
  • [4] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
    Tepperman, Joseph
    Bresch, Erik
    Kim, Yoon-Chul
    Lee, Sungbok
    Goldstein, Louis
    Narayanan, Shrikanth
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
  • [5] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
    Tanji, Ryo
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2021, 2021, : 3176 - 3180
  • [6] Recommendations for real-time speech MRI
    Lingala, Sajan Goud
    Sutton, Brad P.
    Miquel, Marc E.
    Nayak, Krishna S.
    JOURNAL OF MAGNETIC RESONANCE IMAGING, 2016, 43 (01) : 28 - 44
  • [7] Speech production real-time MRI at 0.55 T
    Lim, Yongwan
    Kumar, Prakash
    Nayak, Krishna S.
    MAGNETIC RESONANCE IN MEDICINE, 2024, 91 (01) : 337 - 343
  • [8] Deblurring for spiral real-time MRI using convolutional neural networks
    Lim, Yongwan
    Bliesener, Yannick
    Narayanan, Shrikanth
    Nayak, Krishna S.
    MAGNETIC RESONANCE IN MEDICINE, 2020, 84 (06) : 3438 - 3452
  • [9] Investigating Articulatory Setting - Pauses, Ready Position, and Rest - Using Real-Time MRI
    Ramanarayanan, Vikram
    Byrd, Dani
    Goldstein, Louis
    Narayanan, Shrikanth
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1994 - +
  • [10] Motion detection of articulatory movement with paralinguistic information using real-time MRI movie
    Asai, Takuya
    Kikuchi, Hideaki
    Maekawa, Kikuo
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 96 - 101