A silent speech system based on permanent magnet articulography and direct synthesis

被引:40
|
作者
Gonzalez, Jose A. [1 ]
Cheah, Lam A. [2 ]
Gilbert, James M. [2 ]
Bai, Jie [2 ]
Ell, Stephen R. [3 ]
Green, Phil D. [1 ]
Moore, Roger K. [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
[2] Univ Hull, Sch Engn, Kingston Upon Hull, Yorks, England
[3] Hull & East Yorkshire Hosp Trust, Castle Hill Hosp, Cottingham, England
基金
美国国家卫生研究院;
关键词
Silent speech interfaces; Speech rehabilitation; Speech synthesis; Permanent magnet articulography; Augmentative and alternative communication; MAXIMUM-LIKELIHOOD-ESTIMATION; VOICE CONVERSION; VOCAL-TRACT; RECOGNITION; EXTRACTION;
D O I
10.1016/j.csl.2016.02.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:67 / 87
页数:21
相关论文
共 50 条
  • [1] Analysis of Phonetic Similarity in a Silent Speech Interface based on Permanent Magnetic Articulography
    Gonzalez, Jose A.
    Cheah, Lam A.
    Bai, Jie
    Ell, Stephen R.
    Gilbert, James M.
    Moore, Roger K.
    Green, Phil D.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1018 - 1022
  • [2] Integrating User-Centred Design in the Development of a Silent Speech Interface Based on Permanent Magnetic Articulography
    Cheah, Lam A.
    Gilbert, James M.
    Gonzalez, Jose A.
    Bai, Jie
    Ell, Stephen R.
    Fagan, Michael J.
    Moore, Roger K.
    Green, Phil D.
    Rychenko, Sergey I.
    BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, BIOSTEC 2015, 2015, 574 : 324 - 337
  • [3] Vocoder-Based Speech Synthesis from Silent Videos
    Michelsanti, Daniel
    Slizovskaia, Olga
    Haro, Gloria
    Gomez, Emilia
    Tan, Zheng-Hua
    Jensen, Jesper
    INTERSPEECH 2020, 2020, : 3530 - 3534
  • [4] A pilot study on augmented speech communication based on Electro-Magnetic Articulography
    Heracleous, Panikos
    Badin, Pierre
    Bailly, Gerard
    Hagita, Norihiro
    PATTERN RECOGNITION LETTERS, 2011, 32 (08) : 1119 - 1125
  • [5] Power Control for Direct-Driven Permanent Magnet Wind Generator System with Battery Storage
    Guang, Chu Xiao
    Ying, Kong
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [6] Evaluation of a Silent Speech Interface based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary
    Gonzalez, Jose A.
    Cheah, Lam A.
    Green, Phil D.
    Gilbert, James M.
    Ell, Stephen R.
    Moore, Roger K.
    Holdsworth, Ed
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3986 - 3990
  • [7] Speech Synthesis Parameter Generation for the Assistive Silent Speech Interface MVOCA
    Hofe, Robin
    Ell, Stephen R.
    Fagan, Michael J.
    Gilbert, James M.
    Green, Phil D.
    Moore, Roger K.
    Rybchenko, Sergey I.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3020 - +
  • [8] Inverse filter based excitation model for HMM-based speech synthesis system
    Reddy, Mittapalle Kiran
    Rao, Krothapalli Sreenivasa
    IET SIGNAL PROCESSING, 2018, 12 (04) : 544 - 548
  • [9] An HMM-based Cantonese Speech Synthesis System
    Wang, Xin
    Wu, Zhiyong
    2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
  • [10] A corpus-based speech synthesis system for Uyghur
    Silamu, Wushour
    Tursun, Nasirjan
    Tursun, Mamateli
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 373 - 376