Combining spectral and temporal modification techniques for speech intelligibility enhancement

被引:6
作者
Cooke, Martin [1 ,2 ]
Aubanel, Vincent [3 ]
Garcia Lecumberri, Maria Luisa [2 ]
机构
[1] Ikerbasque Basque Sci Fdn, Bilbao, Spain
[2] Univ Basque Country, Language & Speech Lab, Vitoria 01006, Spain
[3] Univ Grenoble Alpes, Ctr Natl Rech Sci, GIPSA Lab, Grenoble, France
关键词
Speech modification; Intelligibility; Retiming; Glimpsing; COCHLEA-SCALED ENTROPY; NOISE; CLEAR; INTENSITY;
D O I
10.1016/j.csl.2018.10.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modifying clean speech prior to output in noisy conditions can lead to substantial intelligibility gains. Most algorithms operate by redistributing energy across the signal, leaving the timing of the underlying speech sounds intact. Other techniques do alter the timing of speech relative to the masker. Both classes of approach - spectral and temporal - lead to a reduction in energetic masking. The current study examines how their combination affects intelligibility. Arguments can be made for both synergy and redundancy, and the presence of distortions introduced by both spectral and temporal approaches might even lead to an antagonistic combination. A cohort of native Spanish listeners identified keywords in sentences in unmodified form and following spectral, temporal and spectro-temporal modification, in the presence of a fluctuating masker. Errors in the spectro-temporal condition were substantially lower than following spectral or temporal modification alone, with a three-fold reduction compared to unmodified speech. Spectro-temporal gains were observed for all phonemes. A glimpse-based model of energetic masking incorporating speech rate changes predicts intelligibility (r = .96), and a glimpsing analysis provides further insights into the distinct mechanisms through which spectral and temporal approaches lead to a release from energetic masking. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:26 / 39
页数:14
相关论文
共 50 条
  • [31] Spectral Tilt Estimation for Speech Intelligibility Enhancement Using RNN Based on All-Pole Model
    Zhang, Rui
    Hu, Ruimin
    Li, Gang
    Wang, Xiaochen
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 144 - 156
  • [32] Spectral Dynamics Recovery for Enhanced Speech Intelligibility in Noise
    Petkov, Petko N.
    Kleijn, W. Bastiaan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) : 327 - 338
  • [33] SPEECH INTELLIGIBILITY ENHANCEMENT BY EQUALIZATION FOR IN-CAR APPLICATIONS
    Gentet, Enguerrand
    David, Bertrand
    Denjean, Sebastien
    Richard, Gael
    Roussarie, Vincent
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6934 - 6938
  • [34] Relationship between phoneme-level spectral acoustics and speech intelligibility in healthy speech: a systematic review
    Pommee, Timothy
    Balaguer, Mathieu
    Pinquier, Julien
    Mauclair, Julie
    Woisard, Virginie
    Speyer, Renee
    SPEECH LANGUAGE AND HEARING, 2021, 24 (02) : 105 - 132
  • [35] SIMPLE AND ARTEFACT-FREE SPECTRAL MODIFICATIONS FOR ENHANCING THE INTELLIGIBILITY OF CASUAL SPEECH
    Koutsogiannaki, Maria
    Stylianou, Yannis
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [36] Utilization of the Lombard effect in post-filtering for intelligibility enhancement of telephone speech
    Jokinen, Emma
    Alku, Paavo
    Vainio, Marti
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 590 - 593
  • [37] Multi-target ensemble learning based speech enhancement with temporal-spectral structured target
    Wang, Wenbo
    Guo, Weiwei
    Liu, Houguang
    Yang, Jianhua
    Liu, Songyong
    APPLIED ACOUSTICS, 2023, 205
  • [38] DNN-based monaural speech enhancement with temporal and spectral variations equalization
    Kang, Tae Gyoon
    Shin, Jong Won
    Kim, Nam Soo
    DIGITAL SIGNAL PROCESSING, 2018, 74 : 102 - 110
  • [39] MODIFICATION ON LSA SPEECH ENHANCEMENT FOR SPEECH RECOGNITION
    You, Chang Huai
    Ma, Bin
    Ni, Chongjia
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5475 - 5479
  • [40] Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speech
    Jokinen, Emma
    Remes, Ulpu
    Alku, Paavo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 85 - 89