Ensemble System of Deep Neural Networks for Single-Channel Audio Separation

被引:2
作者
Al-Kaltakchi, Musab T. S. [1 ]
Mohammad, Ahmad Saeed [2 ]
Woo, Wai Lok [3 ]
机构
[1] Mustansiriyah Univ, Coll Engn, Dept Elect Engn, Baghdad, Iraq
[2] Mustansiriyah Univ, Coll Engn, Dept Comp Engn, Baghdad, Iraq
[3] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
关键词
single-channel audio separation; deep neural networks; ideal binary mask; feature fusion; EXTREME LEARNING-MACHINE; NONNEGATIVE MATRIX FACTORIZATION; SPEECH SEPARATION; ALGORITHM;
D O I
10.3390/info14070352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech separation is a well-known problem, especially when there is only one sound mixture available. Estimating the Ideal Binary Mask (IBM) is one solution to this problem. Recent research has focused on the supervised classification approach. The challenge of extracting features from the sources is critical for this method. Speech separation has been accomplished by using a variety of feature extraction models. The majority of them, however, are concentrated on a single feature. The complementary nature of various features have not been thoroughly investigated. In this paper, we propose a deep neural network (DNN) ensemble architecture to completely explore the complimentary nature of the diverse features obtained from raw acoustic features. We examined the penultimate discriminative representations instead of employing the features acquired from the output layer. The learned representations were also fused to produce a new features vector, which was then classified by using the Extreme Learning Machine (ELM). In addition, a genetic algorithm (GA) was created to optimize the parameters globally. The results of the experiments showed that our proposed system completely considered various features and produced a high-quality IBM under different conditions.
引用
收藏
页数:24
相关论文
共 57 条
  • [51] Towards Scaling Up Classification-Based Speech Separation
    Wang, Yuxuan
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (07): : 1381 - 1390
  • [52] Exploring Monaural Features for Classification-Based Speech Segregation
    Wang, Yuxuan
    Han, Kun
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 270 - 279
  • [53] Multiview Spectral Embedding
    Xia, Tian
    Tao, Dacheng
    Mei, Tao
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2010, 40 (06): : 1438 - 1446
  • [54] Extreme Learning Machine With Subnetwork Hidden Nodes for Regression and Classification
    Yang, Yimin
    Wu, Q. M. Jonathan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) : 2885 - 2898
  • [55] Wavesplit: End-to-End Speech Separation by Speaker Clustering
    Zeghidour, Neil
    Grangier, David
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2840 - 2849
  • [56] A Pairwise Algorithm Using the Deep Stacking Network for Speech Separation and Pitch Estimation
    Zhang, Xueliang
    Zhang, Hui
    Nie, Shuai
    Gao, Guanglai
    Liu, Wenju
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1066 - 1078
  • [57] Single-Channel Blind Source Separation of Spatial Aliasing Signal Based on Stacked-LSTM
    Zhao, Mengchen
    Yao, Xiujuan
    Wang, Jing
    Yan, Yi
    Gao, Xiang
    Fan, Yanan
    [J]. SENSORS, 2021, 21 (14)