Monaural speech separation using GA-DNN integration scheme

被引：10

作者：

Sivapatham, Shoba ^{[1
]}

Ramadoss, Rajavel ^{[1
]}

Kar, Asutosh ^{[2
]}

Majhi, Banshidhar ^{[3
]}

机构：

[1] SSN Coll Engn, Dept Elect & Commun Engn, Kalavakkam, India

[2] Indian Inst Informat Technol Design & Mfg, Dept Elect & Commun Engn, Chennai, Tamil Nadu, India

[3] Indian Inst Informat Technol Design & Mfg, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India

来源：

APPLIED ACOUSTICS | 2020年 / 160卷

关键词：

Genetic Algorithm; Deep Neural Network; Monaural Speech Separation; Segmentation; Voiced Speech; Unvoiced Speech; RECURRENT NEURAL-NETWORKS; PITCH TRACKING; ENHANCEMENT; INTELLIGIBILITY; SYSTEM; NOISE;

D O I：

10.1016/j.apacoust.2019.107140

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this research work, we propose the model based on the Genetic Algorithm (GA) and Deep Neural Network (DNN) to enhance the quality and intelligibility of the noisy speech. In this proposed model, the Voiced Speech (VS) T-F mask is computed using correlogram, frame energy and cross-channel correlogram and Unvoiced Speech (UVS) T-F mask is computed using speech onset/offset. The T-F mask obtained using speech onset and offset represents both voiced and unvoiced segment of the noisy speech signal. The UVS T-F mask is obtained by subtracting the VS from the T-F mask obtained earlier using speech onset/offset. Next, the GA is used to find the optimum weight to combine the T-F mask of VS and UVS to improve speech quality and intelligibility. The weight obtained using GA may not be an optimum one for all sets of speech and noise. This research work focuses on this issue and proposes a DNN model to estimate the optimum weight for all sets of speech and noise. The DNN model is trained using features and optimum weight obtained using GA. Later, the trained DNN model is used to estimate the optimum weight for the testing speech and noise samples. The performance of the proposed GA-DNN based model is evaluated using objective and subjective quality and intelligibility measures. The results of the proposed model shows a prompt improvement in the speech quality and intelligibility with average of 0.73, 4.07, 0.17, 0.26 and 0.22 for PESQ SNR, STOI, CSII and NCM when compared with the existing speech separation systems. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：11

共 50 条

[21] Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
Huang, Po-Sen
Kim, Minje
Hasegawa-Johnson, Mark
Smaragdis, Paris
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2136 - 2147
[22] Kaviani M, 2017, INT J COMPUT SCI NET, V17, P118
[23] Single-Channel Speech Separation Using Phase-Based Methods
Lee, Yun-Kyung
Lee, In Sung
Kwon, Oh-Wook
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (04) : 2453 - 2459
[24] Application of Shape Analysis Techniques for Improved CASA-Based Speech Separation
Lee, Yun-Kyung
Kwon, Oh-Wook
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (01) : 146 - 149
[25] Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
Ma, Jianfen
Hu, Yi
Loizou, Philipos C.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (05) : 3387 - 3405
[26] SIMULATION OF AUDITORY NEURAL TRANSDUCTION - FURTHER-STUDIES
MEDDIS, R
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (03) : 1056 - 1063
[27] SPEECH PROCESSING TECHNIQUES AND APPLICATIONS
OLSON, HF
BELAR, H
ROGERS, ES
[J]. IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1967, AU15 (03): : 120 - &
[28] Pan ST, USING GENETIC ALGORI
[29] Pandey A, 2019, INT CONF ACOUST SPEE, P6875, DOI [10.1109/icassp.2019.8683634, 10.1109/ICASSP.2019.8683634]
[30] Patterson RD, 1988, MRC APPL PSYCHOL

← 1 2 3 4 5 →