Semi-automated Speaker Adaptation: How to Control the Quality of Adaptation?

被引：0

作者：

Savchenko, Andrey V. ^{[1
]}

机构：

[1] Natl Res Univ, Higher Sch Econ, Nizhnii Novgorod, Russia

来源：

IMAGE AND SIGNAL PROCESSING, ICISP 2014 | 2014年 / 8509卷

关键词：

Automatic speech recognition; phoneme recognition; speaker adaptation; CMU Sphinx; voice control; linear autoregression model; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since the early 1990s, speaker adaptation have become one of the intensive areas in speech recognition. State-of-the-art batch-mode adaptation algorithms assume that speech of particular speaker contains enough information about the user's voice. In this article we propose to allow the user to manually verify if the adaptation is useful. Our procedure requires the speaker to pronounce syllables containing each vowel of particular language. The algorithm contains two steps looping through all syllables. At first, LPC analysis is performed for extracted vowel and the LPC coefficients are used to synthesize the new sound (with a fixed pitch period) and play it. If this synthesized sound is not perceived by the user as an original one then the syllable should be recorded again. At the second stage, speaker is asked to produce another syllable with the same vowel to automatically verify the stability of pronunciation. If two signals are closed (in terms of the Itakura-Saito divergence) then the sounds are marked as "good" for adaptation. Otherwise both steps are repeated. In the experiment we examine a problem of vowel recognition for Russian language in our voice control system which fuses two classifiers: the CMU Sphinx with speaker-independent acoustic model and Euclidean comparison of MFCC features of model vowel and input signal frames. Our results support the statement that the proposed approach provides better accuracy and reliability in comparison with traditional MAP/MLLR techniques implemented in the CMU Sphinx.

引用

页码：638 / 646

页数：9

共 8 条

[1]

Benesty J., 2008, SPRINGER HDB SPEECH

[2]

Kim D.Y., 2004, ICSLP 2004

[3] Rapid speaker adaptation in eigenvoice space [J].

Kuhn, R ;

Junqua, JC ;

Nguyen, P ;

Niedzielski, N .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06) :695-707

[4]

Marple Jr S.L., 1989, PRENTICE HALL SERIES

[5] Phonetic encoding method in the isolated words recognition problem [J].

Savchenko, A. V. .

JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2014, 59 (04) :310-315

[6] Phonetic words decoding software in the problem of Russian speech Recognition [J].

Savchenko, A. V. .

AUTOMATION AND REMOTE CONTROL, 2013, 74 (07) :1225-1232

[7]

Savchenko L.V., 2013, LNCS, V7911, P176

[8] Discriminalve cluster adaptive training [J].

Yu, Kai ;

Gales, Mark J. F. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05) :1694-1703

← 1 →