Mixed-modality speech recognition and interaction using a wearable artificial throat

被引:0
作者
Qisheng Yang
Weiqiu Jin
Qihang Zhang
Yuhong Wei
Zhanfeng Guo
Xiaoshi Li
Yi Yang
Qingquan Luo
He Tian
Tian-Ling Ren
机构
[1] Tsinghua University,School of Integrated Circuits and Beijing National Research on Information Science and Technology (BNRist)
[2] Shanghai Jiao Tong University,Shanghai Lung Cancer Center, Shanghai Chest Hospital
[3] Shanghai Jiao Tong University,School of Medicine
来源
Nature Machine Intelligence | 2023年 / 5卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Researchers have recently been pursuing technologies for universal speech recognition and interaction that can work well with subtle sounds or noisy environments. Multichannel acoustic sensors can improve the accuracy of recognition of sound but lead to large devices that cannot be worn. To solve this problem, we propose a graphene-based intelligent, wearable artificial throat (AT) that is sensitive to human speech and vocalization-related motions. Its perception of the mixed modalities of acoustic signals and mechanical motions enables the AT to acquire signals with a low fundamental frequency while remaining noise resistant. The experimental results showed that the mixed-modality AT can detect basic speech elements (phonemes, tones and words) with an average accuracy of 99.05%. We further demonstrated its interactive applications for speech recognition and voice reproduction for the vocally disabled. It was able to recognize everyday words vaguely spoken by a patient with laryngectomy with an accuracy of over 90% through an ensemble AI model. The recognized content was synthesized into speech and played on the AT to rehabilitate the capability of the patient for vocalization. Its feasible fabrication process, stable performance, resistance to noise and integrated vocalization make the AT a promising tool for next-generation speech recognition and interaction systems.
引用
收藏
页码:169 / 180
页数:11
相关论文
共 77 条
[1]  
Gonzalez-Lopez JA(2020)Silent speech interfaces for speech restoration: a review IEEE Access 8 177995-178021
[2]  
Gomez-Alanis A(1995)The cocktail party phenomenon revisited: attention and memory in the classic selective listening procedure of Cherry (1953) J. Exp. Psychol. Gen. 124 243-11
[3]  
Martin Donas JM(2015)Brain-to-text: decoding spoken phrases from phone representations in the brain Front. Neurosci. 9 1-498
[4]  
Perez-Cordoba JL(2019)Speech synthesis from neural decoding of spoken sentences Nature 568 493-353
[5]  
Gomez AM(2010)Modeling coarticulation in EMG-based continuous speech recognition Speech Commun. 52 341-2526
[6]  
Wood NL(2014)Tackling speaking mode varieties in EMG-based speech recognition IEEE Trans. Biomed. Eng. 61 2515-2385
[7]  
Cowan N(2017)EMG-to-speech: direct generation of speech from facial electromyographic signals IEEE/ACM Trans. Audio Speech Lang. Process. 25 2375-8
[8]  
Herff C(2020)A deep-learned skin sensor decoding the epicentral human motions Nat. Commun. 11 1-1374
[9]  
Anumanchipalli GK(2016)Nanoparticle based curve arrays for multirecognition flexible electronics Adv. Mater. 28 1369-8
[10]  
Chartier J(2017)An intelligent artificial throat with sound-sensing ability based on laser induced graphene Nat. Commun. 8 1-8647