Reconstruction techniques for improving the perceptual quality of binary masked speech

被引：29

作者：

Williamson, Donald S. ^{[1
]}

Wang, Yuxuan ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2014年 / 136卷 / 02期

关键词：

NONNEGATIVE MATRIX FACTORIZATION; SPARSE REPRESENTATION; INTELLIGIBILITY; NOISE; ALGORITHM; FEATURES;

D O I：

10.1121/1.4884759

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This study proposes an approach to improve the perceptual quality of speech separated by binary masking through the use of reconstruction in the time-frequency domain. Non-negative matrix factorization and sparse reconstruction approaches are investigated, both using a linear combination of basis vectors to represent a signal. In this approach, the short-time Fourier transform (STFT) of separated speech is represented as a linear combination of STFTs from a clean speech dictionary. Binary masking for separation is performed using deep neural networks or Bayesian classifiers. The perceptual evaluation of speech quality, which is a standard objective speech quality measure, is used to evaluate the performance of the proposed approach. The results show that the proposed techniques improve the perceptual quality of binary masked speech, and outperform traditional time-frequency reconstruction approaches. (C) 2014 Acoustical Society of America.

引用

页码：892 / 902

页数：11

共 55 条

[1] [Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
[2] [Anonymous], 2008, P 16 EUR SIGN PROC C
[3] Determination of the potential benefit of time-frequency gain manipulation
Anzalone, Michael C.
Calandruccio, Lauren
Doherty, Karen A.
Carney, Laurel H.
[J]. EAR AND HEARING, 2006, 27 (05) : 480 - 492
[4] Araki S, 2005, INT CONF ACOUST SPEE, P81
[5] Blumensath T, 2007, LECT NOTES COMPUT SC, V4666, P341
[6] Boersma P., 2012, PRAAT DOING PHONETIC
[7] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
Brungart, Douglas S.
Chang, Peter S.
Simpson, Brian D.
Wang, DeLiang
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) : 4007 - 4018
[8] Stable signal recovery from incomplete and inaccurate measurements
Candes, Emmanuel J.
Romberg, Justin K.
Tao, Terence
[J]. COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2006, 59 (08) : 1207 - 1223
[9] Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise
Cao, Shuyang
Li, Liang
Wu, Xihong
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 129 (04) : 2227 - 2236
[10] Carmi A., 2009, ABCS APPROXIMATE BAY, P1

← 1 2 3 4 5 6 →