An empirical study of cross-lingual transfer learning techniques for small-footprint keyword spotting

被引:7
作者
Sun, Ming [1 ]
Schwarz, Andreas [1 ]
Wu, Minhua [1 ]
Strom, Nikko [1 ]
Matsoukas, Spyros [1 ]
Vitaladevuni, Shiv [1 ]
机构
[1] Amazon Com, Alexa Machine Learning, Seattle, WA USA
来源
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2017年
关键词
transfer learning; keyword spotting; cross lingual; small-footprint; CANONICAL CORRELATION-ANALYSIS; SPEECH;
D O I
10.1109/ICMLA.2017.0-150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents our work on building a small footprint keyword spotting system for a resource-limited language, which requires low CPU, memory and latency. Our keyword spotting system consists of deep neural network (DNN) and hidden Markov model (HMM), which is a hybrid DNN-HMM decoder. We investigate different transfer learning techniques to leverage knowledge and data from a resource-abundant source language to improve the keyword DNN training for a target language which has limited in-domain data. The approaches employed in this paper include training a DNN using source language data to initialize the target language DNN training, mixing data from source and target languages together in a multi-task DNN training setup, using logits computed from a DNN trained on the source language data to regularize the keyword DNN training in the target language, as well as combinations of these techniques. Given different amounts of target language training data, our experimental results show that these transfer learning techniques successfully improve keyword spotting performance for the target language, measured by the area under the curve (AUC) of DNN-HMM decoding detection error tradeoff (DET) curves using a large in-house far-field test set.
引用
收藏
页码:255 / 260
页数:6
相关论文
共 40 条
[1]  
[Anonymous], AUT SPEECH REC UND A
[2]  
[Anonymous], AC SPEECH SIGN PROC
[3]  
[Anonymous], AC SPEECH SIGN PROC
[4]  
[Anonymous], P INTERSPEECH
[5]  
[Anonymous], P INTERSPEECH
[6]  
[Anonymous], SPOK LANG TECHN WORK
[7]  
[Anonymous], 2013, THESIS
[8]  
[Anonymous], 2015, INTERSPEECH
[9]  
[Anonymous], INTERSPEECH
[10]  
Ba LJ, 2014, ADV NEUR IN, V27