Speaker selection training for large vocabulary continuous speech recognition

被引：0

作者：

Huang, C ^{[1
]}

Chen, T ^{[1
]}

Chang, E ^{[1
]}

机构：

[1] Microsoft Res Asia, Beijing 100080, Peoples R China

来源：

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Acoustic variability across speakers is one of the challenges of speaker independent (SI) speech recognition systems, As a powerful solution, dominant speaker adaptation technologies such as MLLR and MAP may become inefficient because of the lack of enough enrollment data. In this paper, we propose an adaptation method based on speaker selection training, which makes full use of statistics of training corpus. Relative error rate reduction of 5.31% is achieved when only one utterance is available. We compare different speaker selection strategies, namely, PCA, HMM and GMM based methods. In addition, impacts of number of selected cohort speakers and number of utterances from target speaker are investigated. Furthermore, comparison and integration with MLLR adaptation are also shown. Finally, some ongoing work such as dynamically varying number of selected speakers, measuring the relative contribution among the selected speakers and speeding up the computationally expensive procedure of re-estimation with model synthesis are also discussed.

引用

页码：609 / 612

页数：4

共 50 条

[1] Speaker verification through large vocabulary continuous speech recognition
Newman, M
Gillick, L
Ito, Y
McAllaster, D
Peskin, B
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2419 - 2422
[2] Probabilistic Latent Speaker Training for Large Vocabulary Speech Recognition
Su, Dan
Wu, Xihong
Chi, Huisheng
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1225 - 1228
[3] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
Thelen, E
Aubert, X
Beyerlein, P
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
[4] ON LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
LEE, KF
SPEECH COMMUNICATION, 1988, 7 (04) : 375 - 379
[5] Feature selection in mandarin large vocabulary continuous speech recognition
Zhu, X
Chen, YN
Liu, J
Liu, RS
2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 508 - 511
[6] Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent
Hemakumar, G.
Punitha, P.
INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 73 - 80
[7] Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition
Roupakia, Zoi
Ragni, Anton
Gales, Mark
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1782 - 1785
[8] Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech
Cerva, P
Nouza, J
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 203 - 210
[9] Discriminative training of decoding graphs for large vocabulary continuous speech recognition
Kuo, Hong-Kwang Jeff
Kingsbury, Brian
Zweig, Geoffrey
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 45 - +
[10] Improved discriminative training techniques for large vocabulary continuous speech recognition
Povey, D
Woodland, PC
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 45 - 48

← 1 2 3 4 5 →