Cross-language Transfer Learning for Deep Neural Network Based Speech Enhancement

被引：0

作者：

Xu, Yong ^{[1
]}

Du, Jun ^{[1
]}

Dai, Li-Rong ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Changsha, Hunan, Peoples R China

[2] Georgia Inst Technol, Sch Elect & Comp Engn, George Town, Malaysia

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

speech enhancement; deep neural network; transfer learning; multi-lingual; resource-limited language; NOISE; ENVIRONMENTS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a transfer learning approach to adapt a well-trained model obtained with high-resource materials of one language to another target language using a small amount of adaptation data for speech enhancement based on deep neural networks (DNNs). We investigate the performance degradation issues of enhancing noisy Mandarin speech data using DNN models already trained with only English speech materials, and vice versa. By assuming that the hidden layers of the well-trained DNN regression model as a cascade of feature extractors, we hypothesize that the first several layers should be transferable between languages. Our experimental results indicate that even with only about 1 minute of adaptation data from the resource-limited language we can achieve a considerable performance improvement over the DNN model without cross-language transfer learning.

引用

页码：336 / +

页数：2

共 21 条

[1]

[Anonymous], 2005, Speech Enhancement

[2]

[Anonymous], 2004, 100 NONSPEECH ENV SO

[3] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[4] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[5] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].

Cohen, I .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475

[6] Speech enhancement for non-stationary noise environments [J].

Cohen, I ;

Berdugo, B .

SIGNAL PROCESSING, 2001, 81 (11) :2403-2418

[7]

Du J, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P569

[8] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

[10]

Garofolo J., 1988, Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database

← 1 2 3 →