Analysing Acoustic Model Changes for Active Learning in Automatic Speech Recognition

被引：0

作者：

Wu, Chenhao ^{[1
]}

Ng, Raymond W. M. ^{[1
]}

Torralba, Oscar Saz ^{[1
]}

Hain, Thomas ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England

来源：

2017 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP) | 2017年

关键词：

Active learning; data selection; confidence measures; speaker adaptation;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In active learning for Automatic Speech Recognition (ASR), a portion of data is automatically selected for manual transcription. The objective is to improve ASR performance with retrained acoustic models. The standard approaches are based on confidence of individual sentences. In this study, we look into an alternative view on transcript label quality, in which Gaussian Supervector Distance (GSD) is used as a criterion for data selection. GSD is a metric which quantifies how the model was changed during its adaptation. By using an automatic speech recognition transcript derived from an out- of- domain acoustic model, unsupervised adaptation was conducted and GSD was computed. The adapted model is then applied to an audio book transcription task. It is found that GSD provide hints for predicting data transcription quality. A preliminary attempt in active learning proves the effectiveness of GSD selection criterion over random selection, shedding light on its prospective use.

引用

页数：5

共 22 条

[1]

[Anonymous], 2015, P INT C AC SPEECH SI

[2]

[Anonymous], 1996, J ARTIFICIAL INTELLI

[3]

Dasgupta S., 2008, ICML08: Proceedings of the 25th International Conference on Machine Learning, P208

[4]

Dehak N., 2011, IEEE T SPEECH AUDIO, V19

[5]

Dilek Hakkani-Tur, 2002, P ICASSP

[6] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[7]

Hain T, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P358

[8] SPEECH MODELING BASED ON COMMITTEE-BASED ACTIVE LEARNING [J].

Hamanaka, Yuzo ;

Shinoda, Koichi ;

Furui, Sadaoki ;

Emori, Tadashi ;

Koshinaka, Takafumi .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4350-4353

[9]

Kuo Hong-Kwang Jeff, 2005, P INT

[10]

Li Deng, 2013, IEEE T AUDIO SPEECH

← 1 2 3 →