JOINT ACOUSTIC FACTOR LEARNING FOR ROBUST DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Kundu, Souvik ^{[1
]}

Mantena, Gautam ^{[1
]}

Qian, Yanmin ^{[2
]}

Tan, Tian ^{[2
]}

Delcroix, Marc ^{[3
]}

Sim, Khe Chai ^{[1
]}

机构：

[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore

[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[3] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

deep neural networks; joint factor learning; adaptation; bottleneck vectors; robust speech recognition; SPEAKER ADAPTATION; TRANSFORMATIONS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks (DNNs) for acoustic modeling have been shown to provide impressive results on many state-of-the-art automatic speech recognition (ASR) applications. However, DNN performance degrades due to mismatches in training and testing conditions and thus adaptation is necessary. In this paper, we explore the use of discriminative auxiliary input features obtained using joint acoustic factor learning for DNN adaptation. These features are derived from a bottleneck (BN) layer of a DNN and are referred to as BN vectors. To derive these BN vectors, we explore the use of two types of joint acoustic factor learning which capture speaker and auxiliary information such as noise, phone and articulatory information of speech. In this paper, we show that these BN vectors can be used for adaptation and thereby improve the performance of an ASR system. We also show that the performance can be further improved on augmenting these BN vectors to conventional i-vectors. In this paper, experiments are performed on Aurora-4, REVERB challenge and AMI databases.

引用

页码：5025 / 5029

页数：5

共 35 条

[1] Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211
[2] Albesano D, 2006, IEEE IJCNN, P1554
[3] Anguera X., 2014, BEAMFORMIT
[4] Acoustic beamforming for speaker diarization of meetings
Anguera, Xavier
Wooters, Chuck
Hernando, Javier
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
[5] [Anonymous], 2015, REVERB CHALLENGE
[6] [Anonymous], 2011, IEEE 2011 WORKSHOP
[7] [Anonymous], 2014, TECH REP
[8] Multitask learning
Caruana, R
[J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75
[9] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
[10] Delcroix M., 2014, P REVERB MAY

← 1 2 3 4 →