Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs

被引：18

作者：

Cetin, Oezguer ^{[1
]}

Magimai-Doss, Mathew ^{[2
]}

Livescu, Karen ^{[3
]}

Kantor, Arthur ^{[4
]}

King, Simon ^{[5
]}

Bartels, Chris ^{[6
]}

Frankel, Joe ^{[5
]}

机构：

[1] Yahoo Inc, Santa Clara, CA USA

[2] IDIAP, Res Inst, Martigny, Switzerland

[3] MIT, Cambridge, MA USA

[4] Univ Illinois, Urbana, IL USA

[5] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[6] Univ Washington, Seattle, WA 98195 USA

来源：

2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 | 2007年

基金：

瑞士国家科学基金会; 美国国家科学基金会;

关键词：

speech recognition; feedforward neural networks; hidden Markov models;

D O I：

10.1109/ASRU.2007.4430080

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set-2000 hours of English conversational telephone speech. We also explore how portable phone- and articulatory feature-based tandem features are in an entirely different language-Mandarin-without any retraining. We find that while the phone-based features perform slightly better am AF-based features in the matched-language condition, they perform Significantly better in the cross-language condition. However, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel factored observation modeling schemes that allow for greater flexibility in combining the tandem and standard features.

引用

页码：36 / +

页数：2

共 22 条

[1]

[Anonymous], 1998, P INT C SPOKEN LANGU

[2]

[Anonymous], 1999, Proceedings of the IEEE Workshop on Automatic Speech Recognition Understanding. IEEE, Merano

[3]

Bilmes J, 2002, INT CONF ACOUST SPEE, P3916

[4]

BILMES JA, 1998, P ICSLP, P69

[5]

CETIN O, 2007, P ICASSP, P645

[6]

Ellis DPW, 2001, INT CONF ACOUST SPEE, P517, DOI 10.1109/ICASSP.2001.940881

[7]

FRANKEL J, 2007, P INT

[8]

Hermansky H, 2000, INT CONF ACOUST SPEE, P1635, DOI 10.1109/ICASSP.2000.862024

[9]

Hosom J.-P, 2002, 7 INT C SPOKEN LANGU

[10]

HWANG MY, 2007, P INT

← 1 2 3 →