Ensemble System of Deep Neural Networks for Single-Channel Audio Separation

被引：2

作者：

Al-Kaltakchi, Musab T. S. ^{[1
]}

Mohammad, Ahmad Saeed ^{[2
]}

Woo, Wai Lok ^{[3
]}

机构：

[1] Mustansiriyah Univ, Coll Engn, Dept Elect Engn, Baghdad, Iraq

[2] Mustansiriyah Univ, Coll Engn, Dept Comp Engn, Baghdad, Iraq

[3] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England

来源：

INFORMATION | 2023年 / 14卷 / 07期

关键词：

single-channel audio separation; deep neural networks; ideal binary mask; feature fusion; EXTREME LEARNING-MACHINE; NONNEGATIVE MATRIX FACTORIZATION; SPEECH SEPARATION; ALGORITHM;

D O I：

10.3390/info14070352

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech separation is a well-known problem, especially when there is only one sound mixture available. Estimating the Ideal Binary Mask (IBM) is one solution to this problem. Recent research has focused on the supervised classification approach. The challenge of extracting features from the sources is critical for this method. Speech separation has been accomplished by using a variety of feature extraction models. The majority of them, however, are concentrated on a single feature. The complementary nature of various features have not been thoroughly investigated. In this paper, we propose a deep neural network (DNN) ensemble architecture to completely explore the complimentary nature of the diverse features obtained from raw acoustic features. We examined the penultimate discriminative representations instead of employing the features acquired from the output layer. The learned representations were also fused to produce a new features vector, which was then classified by using the Extreme Learning Machine (ELM). In addition, a genetic algorithm (GA) was created to optimize the parameters globally. The results of the experiments showed that our proposed system completely considered various features and produced a high-quality IBM under different conditions.

引用

页数：24

共 57 条

[1] Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases [J].

Al-Kaltakchi, Musab T. S. ;

Abdullah, Mohammed A. M. ;

Woo, Wai L. ;

Dlay, Satnam S. .

CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) :4903-4923

[2] Comparisons of extreme learning machine and backpropagation-based i-vector approach for speaker identification [J].

Al-Kaltakchi, Musab T. S. ;

Al-Nima, Raid R. O. ;

Abdullah, Mohammed A. M. .

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (03) :1236-1245

[3] Thorough evaluation of TIMIT database speaker identification performance under noise with and without the G.712 type handset [J].

Al-Kaltakchi, Musab T. S. ;

Al-Nima, Raid Rafi Omar ;

Abdullah, Mohammed A. M. ;

Abdullah, Hikmat N. .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) :851-863

[4] Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects [J].

Al-Kaltakchi, Musab T. S. ;

Woo, Wai L. ;

Dlay, Satnam ;

Chambers, Jonathon A. .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2017,

[5] The PASCAL CHiME speech separation and recognition challenge [J].

Barker, Jon ;

Vincent, Emmanuel ;

Ma, Ning ;

Christensen, Heidi ;

Green, Phil .

COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) :621-633

[6] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[7]

Bezdek J.C., 2002, LNCS: Vol. 2275. Proceedings of international conference on fuzzy systems (AFSS), Calcutta, V2275, P187, DOI DOI 10.1007/3-540-45631-7_39

[8]

Bhatia R., 2013, MATRIX ANAL SPRINGER, V169, DOI DOI 10.1007/978-1-4612-0653-8

[9] COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].

BROWN, GJ ;

COOKE, M .

COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336

[10] A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks [J].

Du, Jun ;

Tu, Yanhui ;

Dai, Li-Rong ;

Lee, Chin-Hui .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (08) :1424-1437

← 1 2 3 4 5 6 →