Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

被引：10

作者：

Korvel, Grazina ^{[1
]}

Kakol, Krzysztof ^{[2
,3
]}

Kurasova, Olga ^{[1
]}

Kostek, Bozena ^{[2
]}

机构：

[1] Vilnius Univ, Inst Data Sci & Digital Technol, LT-08412 Vilnius, Lithuania

[2] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Audio Acoust Lab, PL-80233 Gdansk, Poland

[3] PGS Software SA, PL-50086 Gdansk, Poland

来源：

IEEE ACCESS | 2020年 / 8卷 / 155156-155170期

关键词：

Speech enhancement; Harmonic analysis; Speech recognition; Context modeling; Noise measurement; Acoustics; Speech synthesis; Lombard speech; quality of experience; speech modeling techniques; SPEAKING STYLE CONVERSION; PARKINSONS-DISEASE; VOCODER; AUDIO; INTELLIGIBILITY; INTENSITY;

D O I：

10.1109/ACCESS.2020.3015421

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech in the context of speech enhancement. For this purpose, 100 utterances of natural speech, and 100 with the Lombard effect induced are used. The goal of this study is to check to what extent speech utterances based on these models are recognizable and at what SNR (Signal-to-Noise Ratio) level threshold a particular model stops working. For this purpose, the synthesized models and Lombard speech are mixed with babble speech and street noise recordings with different SNRs. The quality of these models is measured, employing objective indicators as well as subjective tests. Since there is no standardized measure to apply to enhanced speech, an objective measure of assessing the speech quality of a model synthesizing Lombard speech characteristics, based on a feature vector, is proposed. Our approach is then compared with the standardized metric used in telecommunications as well as with subjective test results. The experimental investigations show the superiority of the source-filter models applied to synthesize Lombard speech over other models utilized. Also, the measure proposed correlates more closely with the results of the subjective evaluation than the outcomes from the ITU-T P.563 recommendation. This was checked with a ANOVA statistical analysis.

引用

页码：155156 / 155170

页数：15

共 54 条

[1] ABE T, 1995, INT CONF ACOUST SPEE, P756, DOI 10.1109/ICASSP.1995.479804
[2] Abe T., 1997, Proceedings of the International Symposium on Simulation, Visualization and Auralization for Acoustic Research and Education, P423
[3] ADAMS SG, 1992, EUR J DISORDER COMM, V27, P121
[4] [Anonymous], 2007, EXPT DESIGNS USING A
[5] [Anonymous], 2010, THESIS
[6] [Anonymous], 2005, P INT 2005 LISB PORT
[7] Beauchamp JW, 2017, CURR RES SYST MUSIC, V4, P201, DOI 10.1007/978-3-319-47292-8_7
[8] Beerends JG, 2002, J AUDIO ENG SOC, V50, P765
[9] An Objective Audio Quality Measure Based on Power and Envelope Power Cues
Biberger, Thomas
Flessner, Jan-Hendrik
Huber, Rainer
Ewert, Stephan D.
[J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2018, 66 (7-8): : 578 - 593
[10] Assessing the "Quality-of-the-Acoustics" at Large
Blauert, Jens
[J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2019, 67 (1-2): : 5 - 12

← 1 2 3 4 5 6 →