A TRANSFER LEARNING AND PROGRESSIVE STACKING APPROACH TO REDUCING DEEP MODEL SIZES WITH AN APPLICATION TO SPEECH ENHANCEMENT

被引：0

作者：

Wang, Sicheng ^{[1
]}

Li, Kehuang ^{[1
]}

Huang, Zhen ^{[1
]}

Siniscalchi, Sabato Marco ^{[1
,2
]}

Lee, Chin-Hui ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

[2] Univ Enna Kore, I-94100 Enna, Italy

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Transfer learning; model compression; model stacking; multi-task training; speech enhancement; RECOGNITION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Leveraging upon transfer learning, we distill the knowledge in a conventional wide and deep neural network (DNN) into a narrower yet deeper model with fewer parameters and comparable system performance for speech enhancement. We present three transfer-learning solutions to accomplish our goal. First, the knowledge embedded in the form of the output values of a high-performance DNN is used to guide the training of a smaller DNN model in sequential transfer learning. In the second multi-task transfer learning solution, the smaller DNN is trained to learn the output value of the larger DNN, and the speech enhancement task in parallel. Finally, a progressive stacking transfer learning is accomplished through multi-task learning, and DNN stacking. Our experimental evidences demonstrate 5 times parameter reduction while maintaining similar enhancement performance with the proposed framework.

引用

页码：5575 / 5579

页数：5

共 33 条

[1]

[Anonymous], ARXIV160301670

[2]

Bucilua C., 2006, KDD

[3]

Caruna R., 1993, INT C MACH LEARN, P41, DOI 10.1016/b978-1-55860-307-3.50012-5

[4] Equivalence among Stochastic Logic Circuits and its Application [J].

Chen, Te-Hsuan ;

Hayes, John P. .

2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,

[5]

Chen WL, 2015, PR MACH LEARN RES, V37, P2285

[6]

Cohen I., SPRINGER HDB SPEECH, P873

[7] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[8]

Garofolo J., 2000, SPEECH COMMUN, V30, P95

[9] Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].

Gauvain, Jean-Luc ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298

[10]

Glorot X., 2010, P 13 INT C ART INT S, P249, DOI DOI 10.1109/LGRS.2016.2565705

← 1 2 3 4 →