Single channel source separation using time–frequency non-negative matrix factorization and sigmoid base normalization deep neural networks

被引:0
作者
Yannam Vasantha Koteswararao
C. B. Rama Rao
机构
[1] National Institute of Technology,Department of ECE
来源
Multidimensional Systems and Signal Processing | 2022年 / 33卷
关键词
Short term fourier transform; Time–frequency non-negative matrix factorization; Sigmoid base normalization; Deep neural networks; Soft mask; Inverse STFT operation;
D O I
暂无
中图分类号
学科分类号
摘要
Conventional single channel speech separation has two long-standing issues. The first issue, over-smoothing, is addressed, and estimated signals are used to expand the training data set. Second, DNN generates prior knowledge to address the problem of incomplete separation and mitigate speech distortion. To overcome all current issues, we suggest employing single-channel source separation with time–frequency non-negative matrix factorization, as well as sigmoid-based normalization deep neural networks. The proposed system consists of the two steps listed below. The first is the training phase, and the second is the testing phase. The difference between these two testing and training stages is that the testing stage uses a single-channel multi-talker input signal and the training stage uses a single-channel clean input signal. Both of these testing and training stages send their input signals to Short Term Fourier Transform (STFT). STFT converts input clean signal into spectrograms then uses a feature extraction technique called TFNMF to extract features from spectrograms. After extracting the features, using the SNDNN classification algorithm, and the classified features are converted to softmax. ISTFT then applies to softmax and correctly separates speech signals. Investigational outcomes demonstrate that the proposed structure attains the best result associated with accessible practices.
引用
收藏
页码:1023 / 1043
页数:20
相关论文
共 95 条
  • [1] Chen Z(2018)Progressive joint modeling in unsupervised single-channel overlapped speech recognition IEEE/ACM Trans Audio, Speech, Language Process 26 184-196
  • [2] Droppo J(2016)A regression approach to single-channel speech separation via high-resolution deep neural networks IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 1424-1437
  • [3] Li J(2011)Subjective and objective quality assessment of audio source separation IEEE Transactions on Audio, Speech and Language Processing 19 2046-2057
  • [4] Xiong W(2019)Millimeter-wave downlink positioning with a single-antenna receiver IEEE Transactions on Wireless Communications 18 44794490-1783
  • [5] Du J(2017)Two-stage single-channel audio source separation using deep neural networks IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 1773-1891
  • [6] Yanhui Tu(2021)Dual-transform source separation using sparse nonnegative matrix factorization Circuits, Systems, and Signal Processing. 40 1868-286
  • [7] Dai L-R(2020)Multi-head self-attention based deep clustering for single-channel speech separation IEEE Access 27 271-382
  • [8] Lee C-H(2021)Multichannel speech separation using hybrid GOMF and enthalpy-based deep neural networks Multimedia Systems 13 370-1531
  • [9] Emiya V(2019)Phasebook and friends: Leveraging discrete representations for source separation IEEE Journal of Selected Topics in Signal Processing. 62 1517-1266
  • [10] Vincent E(2019)Predicting speech recognition using the speech intelligibility index and other variables for cochlear implant users Journal of Speech, Language, and Hearing Research 26 1256-4679