Underdetermined blind source separation using CapsNet

被引:0
作者
M. Kumar
V. E. Jayanthi
机构
[1] Chettinad College of Engineering and Technology,
[2] PSNA College of Engineering and Technology,undefined
来源
Soft Computing | 2020年 / 24卷
关键词
Array signal processing; Blind source separation; Capsule networks; Speech recognition; Time–frequency masking;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we consider the problem of separating the speech source signal from the underdetermined convolutive mixture signals using capsule network (CapsNet). The objective of this paper is twofold. They are (1) to improve the underdetermined convolutive blind source separation algorithm in terms of signal-to-distortion ratio, signal-to-interference ratio and signal-to-artifact ratio; (2) to minimize the computational burden of the algorithm so that it is useful for applications like speech recognition system. The time–frequency points of the observed mixture signals are input to the first layer of CapsNet. In the first layer, single-source active point (SSP) is calculated using the ratio of mixtures. These SSPs are lower-level capsules in our system. In the second layer, we find a cluster center using a dynamic routing algorithm and these clusters are used to construct a binary mask. Finally, the algorithm solves the permutation problem by determining the correlation between the amplitudes of adjacent frequency bins. We test our algorithm on the live recording mixture signals obtained in the real environment and synthetically convoluted mixture signals. The test result shows the effectiveness of the proposed method when compared with the existing algorithms in terms of computational load, signal-to-distortion ratio and signal-to-interference ratio.
引用
收藏
页码:9011 / 9019
页数:8
相关论文
共 51 条
[1]  
Aissa-El-Bey A(2007)Blind separation of underdetermined convolutive mixtures using their time–frequency representation IEEE Trans Audio Speech Lang Process 15 1540-1550
[2]  
Abed-Meraim K(2007)Underdetermined blind separation of nondisjoint sources in the time–frequency domain IEEE Trans Signal Process 55 897-907
[3]  
Grenier Y(2007)Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors Signal Process 87 1833-1847
[4]  
Aissa-El-Bey A(1998)Blind source separation based on time–frequency signal representations IEEE Trans Signal Process 46 2888-2897
[5]  
Linh-Trung N(2005)Underdetermined blind separation of convolutive mixtures of speech using time–frequency mask and mixing matrix estimation IEICE Trans Fundam Electron Commun Comput Sci E88A 1693-1700
[6]  
Abed-Meraim K(2015)Sparsity and adaptivity for the blind separation of partially correlated sources IEEE Trans Signal Process 63 1199-1213
[7]  
Belouchrani A(2006)A Bayesian approach for blind separation of sparse sources IEEE Trans Audio Speech Lang Process 14 2174-2188
[8]  
Grenier Y(2009)Underdetermined blind source separation based on subspace representation IEEE Trans Signal Process 57 2604-2614
[9]  
Araki S(2006)Underdetermined blind source separation based on sparse representation IEEE Trans Signal Process 54 423-437
[10]  
Sawada H(2010)Underdetermined convolutive blind source separation via time–frequency masking IEEE Trans Audio Speech Lang Process 18 101-116