Speech Synthesis Using Ambiguous Inputs From Wearable Keyboards

被引：0

作者：

Iwasaki, Matsuri ^{[1
]}

Hara, Sunao ^{[1
]}

Abe, Masanobu ^{[1
]}

机构：

[1] Okayama Univ, Okayama, Japan

来源：

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年

关键词：

D O I：

10.1109/APSIPAASC58517.2023.10317228

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a new application in speech communication using text-to-speech (TTS), and the goal is to enable dysarthria, articulation disorder, or persons who have difficulty in speaking to communicate anywhere and anytime using speech to express their thoughts and feelings. To achieve this goal, an input method is required. Thus, we propose a new text-entry method based on three concepts. First, from an easy-to-carry perspective, we used a wearable keyboard that inputs digits from 0 to 9 in decimal notation according to 10-finger movements. Second, from a no-training perspective, users input sentences in a way of touch typing using the wearable keyboard. Following this method, we obtained a sequence of numbers corresponding to the sentence. Third, a neural machine translation (NMT) method is applied to estimate texts from the sequence of numbers. The NMT was trained using two datasets; one is a Japanese-English parallel corpus containing 2.8 million pairs of sentences, which were extracted from TV and movie subtitles, while the other is a Japanese text dataset containing 32 million sentences, which were extracted from a question-and-answer platform. Using the model, phonemes and accent symbols were estimated from a sequence of numbers. Thus, the result accuracy in symbol levels was 91.48% and 43.45% of all the sentences were completely estimated with no errors. To subjectively evaluate feasibility of the NMT model, a two-person word association game was conducted; one gave hints using synthesized speech that is generated from symbols estimated by NMT, while the other guessed answers. As a result, 67.95% of all the quizzes were correctly answered, and experiment results show that the proposed method has the potential for dysarthria to communicate with TTS using a wearable keyboard.

引用

页码：1172 / 1178

页数：7

共 50 条

[21] Head motion synthesis from speech using deep neural networks [J].

Chuang Ding ;

Lei Xie ;

Pengcheng Zhu .

Multimedia Tools and Applications, 2015, 74 :9871-9888

[22] Speech synthesis from intracranial stereotactic Electroencephalography using a neural vocoder [J].

Arthur, Frigyes Viktor ;

Csapo, Tamas Gabor .

INFOCOMMUNICATIONS JOURNAL, 2024, 16 (01) :47-55

[23] Head motion synthesis from speech using deep neural networks [J].

Ding, Chuang ;

Xie, Lei ;

Zhu, Pengcheng .

MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) :9871-9888

[24] Can Speech Perception Deficits Cause Phonological Impairments? Evidence From Short-Term Memory for Ambiguous Speech [J].

Smith, Harriet J. ;

Gilbert, Rebecca A. ;

Davis, Matthew H. .

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2024, 153 (04) :957-981

[25] Secure Speech Encryption System Using Segments for Speech Synthesis [J].

Kohata, Minoru .

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, :264-267

[26] Speech synthesis using the CELP algorithm [J].

deCampos, GL ;

Gouvea, EB .

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, :1417-1420

[27] Speech Synthesis Using Compressed Database [J].

Rybarova, R. ;

Rozinaj, G. .

PROCEEDINGS OF ELMAR-2015 57TH INTERNATIONAL SYMPOSIUM ELMAR-2015, 2015, :105-108

[28] Advancing Speech Synthesis using EEG [J].

Krishna, Gautam ;

Tran, Co ;

Carnahan, Mason ;

Tewfik, Ahmed H. .

2021 10TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING (NER), 2021, :199-204

[29] Pali Speech Synthesis using HMM [J].

Charoenrattana, Kittikan ;

Seresangtakul, Pusadee .

2021 13TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST-2021), 2021, :165-169

[30] Speech synthesis using damped sinusoids [J].

Hillenbrand, JM ;

Houde, RA .

JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2002, 45 (04) :639-650

← 1 2 3 4 5 →