Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation

被引：27

作者：

Gao, Bin ^{[1
]}

Woo, W. L. ^{[2
]}

Khor, C. ^{[2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China

[2] Newcastle Univ, Sch Elect & Elect Engn, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2014年 / 135卷 / 03期

关键词：

SEGREGATION;

D O I：

10.1121/1.4864294

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

An unsupervised single channel audio separation method from pattern recognition viewpoint is presented. The proposed method does not require training knowledge and the separation system is based on non-uniform time-frequency (TF) analysis and feature extraction. Unlike conventional research that concentrates on the use of spectrogram or its variants, the proposed separation algorithm uses an alternative TF representation based on the gammatone filterbank. In particular, the monaural mixed audio signal is shown to be considerably more separable in this non-uniform TF domain. The analysis of signal separability to verify this finding is provided. In addition, a variational Bayesian approach is derived to learn the sparsity parameters for optimizing the matrix factorization. Experimental tests have been conducted, which show that the extraction of the spectral dictionary and temporal codes is more efficient using sparsity learning and subsequently leads to better separation performance. (C) 2014 Acoustical Society of America.

引用

页码：1171 / 1185

页数：15

共 34 条

[1]

Abdallah S.A., 2004, P INT C MUS INF RETR, P318

[2] Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription [J].

Bertin, Nancy ;

Badeau, Roland ;

Vincent, Emmanuel .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :538-549

[3] CALCULATION OF A CONSTANT-Q SPECTRAL TRANSFORM [J].

BROWN, JC .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 89 (01) :425-434

[4] Incremental subspace learning via non-negative matrix factorization [J].

Bucak, Serhat S. ;

Gunsel, Bilge .

PATTERN RECOGNITION, 2009, 42 (05) :788-797

[5] SOME EXPERIMENTS ON THE RECOGNITION OF SPEECH, WITH ONE AND WITH 2 EARS [J].

CHERRY, EC .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (05) :975-979

[6]

Cichocki A, 2006, LECT NOTES COMPUT SC, V3889, P32

[7]

Curtis R., 1996, COMPUTER MUSIC TUTOR

[8] Single-Channel Source Separation Using EMD-Subband Variable Regularized Sparse Features [J].

Gao, Bin ;

Woo, W. L. ;

Dlay, S. S. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :961-976

[9]

Goto M., 2003, P 4 INT C MUS INF RE, P229

[10]

Grochenig K., 2001, FDN TIME FREQUENCY A

← 1 2 3 4 →