Combining spectral and temporal modification techniques for speech intelligibility enhancement

被引：6

作者：

Cooke, Martin ^{[1
,2
]}

Aubanel, Vincent ^{[3
]}

Garcia Lecumberri, Maria Luisa ^{[2
]}

机构：

[1] Ikerbasque Basque Sci Fdn, Bilbao, Spain

[2] Univ Basque Country, Language & Speech Lab, Vitoria 01006, Spain

[3] Univ Grenoble Alpes, Ctr Natl Rech Sci, GIPSA Lab, Grenoble, France

来源：

COMPUTER SPEECH AND LANGUAGE | 2019年 / 55卷

关键词：

Speech modification; Intelligibility; Retiming; Glimpsing; COCHLEA-SCALED ENTROPY; NOISE; CLEAR; INTENSITY;

D O I：

10.1016/j.csl.2018.10.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modifying clean speech prior to output in noisy conditions can lead to substantial intelligibility gains. Most algorithms operate by redistributing energy across the signal, leaving the timing of the underlying speech sounds intact. Other techniques do alter the timing of speech relative to the masker. Both classes of approach - spectral and temporal - lead to a reduction in energetic masking. The current study examines how their combination affects intelligibility. Arguments can be made for both synergy and redundancy, and the presence of distortions introduced by both spectral and temporal approaches might even lead to an antagonistic combination. A cohort of native Spanish listeners identified keywords in sentences in unmodified form and following spectral, temporal and spectro-temporal modification, in the presence of a fluctuating masker. Errors in the spectro-temporal condition were substantially lower than following spectral or temporal modification alone, with a three-fold reduction compared to unmodified speech. Spectro-temporal gains were observed for all phonemes. A glimpse-based model of energetic masking incorporating speech rate changes predicts intelligibility (r = .96), and a glimpsing analysis provides further insights into the distinct mechanisms through which spectral and temporal approaches lead to a release from energetic masking. (C) 2018 Elsevier Ltd. All rights reserved.

引用

页码：26 / 39

页数：14

共 50 条

[31] Spectral Tilt Estimation for Speech Intelligibility Enhancement Using RNN Based on All-Pole Model
Zhang, Rui
Hu, Ruimin
Li, Gang
Wang, Xiaochen
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 144 - 156
[32] Spectral Dynamics Recovery for Enhanced Speech Intelligibility in Noise
Petkov, Petko N.
Kleijn, W. Bastiaan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) : 327 - 338
[33] SPEECH INTELLIGIBILITY ENHANCEMENT BY EQUALIZATION FOR IN-CAR APPLICATIONS
Gentet, Enguerrand
David, Bertrand
Denjean, Sebastien
Richard, Gael
Roussarie, Vincent
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6934 - 6938
[34] Relationship between phoneme-level spectral acoustics and speech intelligibility in healthy speech: a systematic review
Pommee, Timothy
Balaguer, Mathieu
Pinquier, Julien
Mauclair, Julie
Woisard, Virginie
Speyer, Renee
SPEECH LANGUAGE AND HEARING, 2021, 24 (02) : 105 - 132
[35] SIMPLE AND ARTEFACT-FREE SPECTRAL MODIFICATIONS FOR ENHANCING THE INTELLIGIBILITY OF CASUAL SPEECH
Koutsogiannaki, Maria
Stylianou, Yannis
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[36] Utilization of the Lombard effect in post-filtering for intelligibility enhancement of telephone speech
Jokinen, Emma
Alku, Paavo
Vainio, Marti
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 590 - 593
[37] Multi-target ensemble learning based speech enhancement with temporal-spectral structured target
Wang, Wenbo
Guo, Weiwei
Liu, Houguang
Yang, Jianhua
Liu, Songyong
APPLIED ACOUSTICS, 2023, 205
[38] DNN-based monaural speech enhancement with temporal and spectral variations equalization
Kang, Tae Gyoon
Shin, Jong Won
Kim, Nam Soo
DIGITAL SIGNAL PROCESSING, 2018, 74 : 102 - 110
[39] MODIFICATION ON LSA SPEECH ENHANCEMENT FOR SPEECH RECOGNITION
You, Chang Huai
Ma, Bin
Ni, Chongjia
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5475 - 5479
[40] Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speech
Jokinen, Emma
Remes, Ulpu
Alku, Paavo
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 85 - 89

← 1 2 3 4 5 →