A TRANSFER LEARNING AND PROGRESSIVE STACKING APPROACH TO REDUCING DEEP MODEL SIZES WITH AN APPLICATION TO SPEECH ENHANCEMENT

被引:0
作者
Wang, Sicheng [1 ]
Li, Kehuang [1 ]
Huang, Zhen [1 ]
Siniscalchi, Sabato Marco [1 ,2 ]
Lee, Chin-Hui [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Univ Enna Kore, I-94100 Enna, Italy
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
关键词
Transfer learning; model compression; model stacking; multi-task training; speech enhancement; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Leveraging upon transfer learning, we distill the knowledge in a conventional wide and deep neural network (DNN) into a narrower yet deeper model with fewer parameters and comparable system performance for speech enhancement. We present three transfer-learning solutions to accomplish our goal. First, the knowledge embedded in the form of the output values of a high-performance DNN is used to guide the training of a smaller DNN model in sequential transfer learning. In the second multi-task transfer learning solution, the smaller DNN is trained to learn the output value of the larger DNN, and the speech enhancement task in parallel. Finally, a progressive stacking transfer learning is accomplished through multi-task learning, and DNN stacking. Our experimental evidences demonstrate 5 times parameter reduction while maintaining similar enhancement performance with the proposed framework.
引用
收藏
页码:5575 / 5579
页数:5
相关论文
共 33 条
[1]  
[Anonymous], ARXIV160301670
[2]  
Bucilua C., 2006, KDD
[3]  
Caruna R., 1993, INT C MACH LEARN, P41, DOI 10.1016/b978-1-55860-307-3.50012-5
[4]   Equivalence among Stochastic Logic Circuits and its Application [J].
Chen, Te-Hsuan ;
Hayes, John P. .
2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
[5]  
Chen WL, 2015, PR MACH LEARN RES, V37, P2285
[6]  
Cohen I., SPRINGER HDB SPEECH, P873
[7]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[8]  
Garofolo J., 2000, SPEECH COMMUN, V30, P95
[9]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[10]  
Glorot X., 2010, P 13 INT C ART INT S, P249, DOI DOI 10.1109/LGRS.2016.2565705