Speech Enhancement based on Deep Convolutional Neural Network

被引:2
作者
Nuthakki, Ramesh [1 ]
Masanta, Payel [1 ]
Yukta, T. N. [1 ]
机构
[1] Atria Inst Technol, Dept Elect & Commun Engn, Bangalore, Karnataka, India
来源
PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021) | 2021年
关键词
Speech Enhancement; Deep Convolutional Neural Networks; Loss Functions; Speech Intelligibility; Harris Hawks Optimization; Coherence Speech Intelligibility Index; Estimated Short Time Objective Intelligibility; Mean Square Error; Perceptual Evaluation of Speech Quality; MEAN-SQUARE ERROR; INTELLIGIBILITY; ALGORITHM;
D O I
10.1109/I-SMAC52330.2021.9640736
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech enhancement is the process of treating noisy speech signals so as to improve human perception as well as improve system understanding of the signal. For speech signals with medium or high signal to noise ratio (SNR), the aim is to produce subjectively pragmatic signal, and for signals having low SNR the aim is to reduce the noise while still maintaining the intelligibility. Many noise reduction algorithms improve overall speech quality but little progress has been made to improve the overall speech intelligibility. This paper proposes a deep convolutional neural network (DCNN) speech enhancement method by enhancing loss function such as extended short time objective ineligibility (ESTOI) and mean square error (MSE). These loss functions are improved using Harris Hawks Optimization (HHO). The enhanced speech signal is acquired by separating the clean speech signal from the noisy speech signal. By using various predictive measure of objective speech intelligibility like short time objective intelligibility, source to artefact ratio (SAR), coherence speech intelligibility index (CS II) and source to distortion ratio (SDR), the efficacy of speech enhancement is calculated. The quality of the enhanced speech signal is assessed using the quality measure such as speech distortion (SD) and perceptual evaluation of speech quality (PES Q).
引用
收藏
页码:770 / 775
页数:6
相关论文
共 23 条
  • [1] Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises
    Chen, Jitong
    Wang, Yuxuan
    Yoho, Sarah E.
    Wang, DeLiang
    Healy, Eric W.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (05) : 2604 - 2612
  • [2] Chen Z, 2017, INT CONF ACOUST SPEE, P246, DOI 10.1109/ICASSP.2017.7952155
  • [3] Chung H, 2018, IEEE W SP LANG TECH, P1, DOI [10.1109/CACS.2018.8606774, 10.1109/SLT.2018.8639524]
  • [4] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
  • [5] Erdogan Hakan, 2017, New Era for Robust SpeechRecognition: ExploitingDeep Learning, P165
  • [6] An enhanced productivity prediction model of active solar still using artificial neural network and Harris Hawks optimizer
    Essa, F. A.
    Abd Elaziz, Mohamed
    Elsheikh, Ammar H.
    [J]. APPLIED THERMAL ENGINEERING, 2020, 170 (170)
  • [7] End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks
    Fu, Szu-Wei
    Wang, Tao-Wei
    Tsao, Yu
    Lu, Xugang
    Kawai, Hisashi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1570 - 1584
  • [8] Subjective Evaluation of a Noise-Reduced Training Target for Deep Neural Network-Based Speech Enhancement
    Gelderblom, Femke B.
    Tronstad, Tron, V
    Viggen, Erlend Magnus
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 583 - 594
  • [9] A new perceptually weighted cost function in deep neural network based speech enhancement systems
    Goli, Peyman
    [J]. HEARING BALANCE AND COMMUNICATION, 2019, 17 (03) : 191 - 196
  • [10] An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type
    Healy, Eric W.
    Yoho, Sarah E.
    Chen, Jitong
    Wang, Yuxuan
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (03) : 1660 - 1669