Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation

被引：23

作者：

Chen, Jitong ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

speech separation; speaker generalization; long short-term memory; NOISE; ALGORITHM;

D O I：

10.21437/Interspeech.2016-551

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech separation can be formulated as a supervised learning problem where a time-frequency mask is estimated by a learning machine from acoustic features of noisy speech. Deep neural networks (DNNs) have been successful for noise generalization in supervised separation. However, real world applications desire a trained model to perform well with both unseen speakers and unseen noises. In this study we investigate speaker generalization for noise-independent models and propose a separation model based on long short-term memory to account for the temporal dynamics of speech. Our experiments show that the proposed model significantly outperforms a DNN in terms of objective speech intelligibility for both seen and unseen speakers. Compared to feedforward networks, the proposed model is more capable of modeling a large number of speakers, and represents an effective approach for speaker- and noise-independent speech separation.

引用

页码：3314 / 3318

页数：5

共 20 条

[1]

[Anonymous], 2007, Speech Enhancement: Theory and Practice

[2] Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises [J].

Chen, Jitong ;

Wang, Yuxuan ;

Yoho, Sarah E. ;

Wang, DeLiang ;

Healy, Eric W. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (05) :2604-2612

[3] Noise perturbation for supervised speech separation [J].

Chen, Jitong ;

Wang, Yuxuan ;

Wang, DeLiang .

SPEECH COMMUNICATION, 2016, 78 :1-10

[4] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

[5] Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors [J].

Erkelens, Jan S. ;

Hendriks, Richard C. ;

Heusdens, Richard ;

Jensen, Jesper .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06) :1741-1752

[6]

Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947

[7] An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type [J].

Healy, Eric W. ;

Yoho, Sarah E. ;

Chen, Jitong ;

Wang, Yuxuan ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (03) :1660-1669

[8] An algorithm to improve speech recognition in noise for hearing-impaired listeners [J].

Healy, Eric W. ;

Yoho, Sarah E. ;

Wang, Yuxuan ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04) :3029-3038

[9]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[10]

Kingma Diederik P., 2014, arXiv

← 1 2 →