Self-learning speaker identification for enhanced speech recognition

被引：11

作者：

Herbig, Tobias ^{[1
,2
]}

Gerl, Franz ^{[3
]}

Minker, Wolfgang ^{[2
]}

机构：

[1] Nuance Commun Aachen GmbH, Ulm, Germany

[2] Univ Ulm, Inst Informat Technol, Ulm, Germany

[3] SVOX Deutschland GmbH, Ulm, Germany

来源：

COMPUTER SPEECH AND LANGUAGE | 2012年 / 26卷 / 03期

关键词：

Speaker identification; Speech recognition; Speaker adaptation; ADAPTATION;

D O I：

10.1016/j.csl.2011.11.002

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates. (C) 2011 Elsevier Ltd. All rights reserved.

引用

页码：210 / 227

页数：18

共 36 条

[1] Discriminative in-set/out-of-set speaker recognition [J].

Angkititrakul, Pongtep ;

Hansen, John H. L. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :498-508

[2]

[Anonymous], 2000, Pattern Classification

[3]

Bisani M, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P409

[4]

Botterweck H, 2001, INT CONF ACOUST SPEE, P353, DOI 10.1109/ICASSP.2001.940840

[5]

CLASS F, 2003, Patent No. 20030187645

[6]

CLASS F, 1993, EUROSPEECH 1993, P803

[7]

FERRAS M, 2008, SPEAK LANG REC WORKS, P21

[8]

Ferràs M, 2007, INT CONF ACOUST SPEE, P53

[9]

Fortuna J., 2005, INTERSPEECH 2005, P1997

[10] Selected topics from 40 years of research on speech and speaker recognition [J].

Furui, Sadaoki .

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :1-8

← 1 2 3 4 →