Improving Deep Neural Networks Based Multi-Accent Mandarin Speech Recognition Using I-Vectors and Accent-Specific Top layer

被引:0
作者
Chen, Mingming [1 ]
Yang, Zhanlei [1 ]
Liang, Jizhong [2 ]
Li, Yanpeng [2 ]
Liu, Wenju [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] China State Grid Corp, ShanXi Elect Power Co, Elect Power Res Inst, Beijing, Peoples R China
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
Accented speech recognition; deep neural networks; model adaptation; i-vectors; KL-divergence regularization;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a method that use i-vectors and model adaptation techniques to improve the performance of deep neural networks(DNNs) based multi-accent Mandarin speech recognition. I-vectors which are speaker-specific features have been proved to be effective when used in accent identification. They can be used in company with conventional spectral features as the input features of DNNs to improve the discrimination for different accents. Meanwhile, we adapt DNNs to different accents by using an accent-specific top layer and shared hidden layers. The accent-specific top layer is used to adapt to different accents while the share hidden layers which can be seen as feature extractors can extract discriminative high-level features between different accents. These two techniques are complementary and can be easily combined together. Our experiments on the 400-hours Intel Accented Mandarin Speech Recognition Corpus show that our proposed method can significantly improve the performance of DNNs-based accented Mandarin speech recognition.
引用
收藏
页码:3620 / 3624
页数:5
相关论文
共 21 条
[1]  
[Anonymous], 2013, ARXIV13013605
[2]  
Arslan L. M., 1996, J ACOUSTIC SOC A DEC
[3]  
Bahari M.H., 2013, P 2013 IEEE INT C AC
[4]   Accent Issues in Large Vocabulary Continuous Speech Recognition [J].
Chao Huang ;
Tao Chen ;
Eric Chang .
International Journal of Speech Technology, 2004, 7 (2-3) :141-153
[5]  
Chen X., 2014, P 2014 9 INT S CHIN
[6]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[7]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[8]  
DeMarco A., 2013, P INT 2013
[9]  
Glembek O., 2011, P 2011 IEEE INT C AC
[10]  
Huang JT, 2013, INT CONF ACOUST SPEE, P7304, DOI 10.1109/ICASSP.2013.6639081