A silent speech system based on permanent magnet articulography and direct synthesis

被引:40
作者
Gonzalez, Jose A. [1 ]
Cheah, Lam A. [2 ]
Gilbert, James M. [2 ]
Bai, Jie [2 ]
Ell, Stephen R. [3 ]
Green, Phil D. [1 ]
Moore, Roger K. [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
[2] Univ Hull, Sch Engn, Kingston Upon Hull, Yorks, England
[3] Hull & East Yorkshire Hosp Trust, Castle Hill Hosp, Cottingham, England
基金
美国国家卫生研究院;
关键词
Silent speech interfaces; Speech rehabilitation; Speech synthesis; Permanent magnet articulography; Augmentative and alternative communication; MAXIMUM-LIKELIHOOD-ESTIMATION; VOICE CONVERSION; VOCAL-TRACT; RECOGNITION; EXTRACTION;
D O I
10.1016/j.csl.2016.02.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:67 / 87
页数:21
相关论文
共 50 条
  • [41] EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
    Li, Hao
    Kang, Yongguo
    Wang, Zhenyu
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3077 - 3081
  • [42] Design and Implementation of Burmese Speech Synthesis System Based on HMM-DNN
    Liu, Mengyuan
    Yang, Jian
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 79 - 83
  • [43] Applying DNN Adaptation to Reduce the Session Dependency of Ultrasound Tongue Imaging-based Silent Speech Interfaces
    Gosztolya, Gabor
    Grosz, Tamas
    Toth, Laszlo
    Marko, Alexandra
    Csapo, Tamas Gabor
    ACTA POLYTECHNICA HUNGARICA, 2020, 17 (07) : 109 - 124
  • [44] Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1133 - 1137
  • [45] A user-friendly headset for radar-based silent speech recognition
    Digehsara, Pouriya Amini
    de Menezes, Joao Vitor Possamai
    Wagner, Christoph
    Baerhold, Michael
    Schaffer, Petr
    Plettemeier, Dirk
    Birkholz, Peter
    INTERSPEECH 2022, 2022, : 4835 - 4839
  • [46] Impact of Lack of Acoustic Feedback in EMG-based Silent Speech Recognition
    Janke, Matthias
    Wand, Michael
    Schultz, Tanja
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2694 - 2697
  • [47] Spanish Phone Confusion Analysis for EMG-Based Silent Speech Interfaces
    Salomons, Inge
    del Blanco, Eder
    Navas, Eva
    Hernaez, Inma
    INTERSPEECH 2023, 2023, : 1179 - 1183
  • [48] Electrode Setup for Electromyography-Based Silent Speech Interfaces: A Pilot Study
    Salomons, Inge
    del Blanco, Eder
    Navas, Eva
    Hernaez, Inma
    SENSORS, 2025, 25 (03)
  • [49] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
    Watts, Oliver
    Zhou, Bowen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
  • [50] Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis
    Dines, John
    Liang, Hui
    Saheer, Lakshmi
    Gibson, Matthew
    Byrne, William
    Oura, Keiichiro
    Tokuda, Keiichi
    Yamagishi, Junichi
    King, Simon
    Wester, Mirjam
    Hirsimaki, Teemu
    Karhila, Reima
    Kurimo, Mikko
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02) : 420 - 437