Audio signal clustering and separation using a stacked autoencoder

被引:0
作者
Jang, Gil-Jin [1 ]
机构
[1] Kyungpook Natl Univ, Sch Elect Engn, 80 Daehakro, Daegu 41566, South Korea
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | 2016年 / 35卷 / 04期
关键词
Audio signal separation; Autoencoder; Deep neural networks; Audio clustering;
D O I
10.7776/ASK.2016.35.4.303
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a novel approach to the problem of audio signal clustering using a stacked autoencoder. The proposed stacked autoencoder learns an efficient representation for the input signal, enables clustering constituent signals with similar characteristics, and therefore the original sources can be separated based on the clustering results. STFT (Short-Time Fourier Transform) is performed to extract time-frequency spectrum, and rectangular windows at all the possible locations are used as input values to the autoencoder. The outputs at the middle, encoding layer, are used to cluster the rectangular windows and the original sources are separated by the Wiener filters derived from the clustering results. Source separation experiments were carried out in comparison to the conventional NMF (Non-negative Matrix Factorization), and the estimated sources by the proposed
引用
收藏
页码:303 / 309
页数:7
相关论文
共 5 条
[1]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[2]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[3]   Monaural speech segregation based on pitch tracking and amplitude modulation [J].
Hu, GN ;
Wang, DL .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (05) :1135-1150
[4]  
Raj B., 2010, P INTERSPEECH, P717
[5]  
Vincent P, 2010, J MACH LEARN RES, V11, P3371