Assessing the benefits of virtual speaker lateralization for binaural speech intelligibility over the Internet

被引:0
|
作者
Orduna-Bustamante, Felipe [1 ]
Padilla-Ortiz, A. L. [2 ]
Mena, Carlos [3 ]
机构
[1] Univ Nacl Autonoma Mexico, Inst Ciencias Aplicadas & Tecnol, Circuito Exterior s-n,Ciudad Univ, Mexico City 04510, Mexico
[2] CONACyT CICESE, Unidad Monterrey, PIIT Apodaca, Alianza Ctr 504, Monterrey 66629, Nuevo Leon, Mexico
[3] Reykjavik Univ, Language & Voice Lab, Menntavegur 1, IS-102 Reykjavik, Iceland
关键词
Binaural speech intelligibility; Virtual speaker lateralization; Spatial release from masking; Internet audio; SPATIAL RELEASE; RECEPTION THRESHOLD; NOISE SOURCE; LOCALIZATION; MASKING; AZIMUTH; CORPUS; HEAD;
D O I
10.1016/j.apacoust.2022.109146
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Binaural speech intelligibility tests using headphones were conducted remotely through the Internet using virtual speaker lateralization at azimuth angles around the front within +/- 45 degrees, combined with babble noise with signal to noise ratios (SNR) from-12 to-4 dB. Monophonic speech recordings, selected from the publicly available Mozilla Common Voice speech corpus in Spanish, were binaurally processed using a database of generic Head-Related Transfer Functions (HRTF) to produce virtual speaker lateralization. A common signal of babble noise was mixed to both the left and right binaural signals, with different SNR. Speech intelligibility tests were conducted through a Google Forms questionnaire contain-ing Internet links to YouTube videos embedding the corresponding test audio files. Symbolic text differ-ence metrics: Levenshtein distance, and word error rate (WER), commonly used in the field of automatic speech recognition (ASR), were used to automatically calculate estimations of speech intelligibility scores, more conventionally used for (human) speech intelligibility research at the word level, together with microscopic speech intelligibility scores at the phonetic symbol level. Speech reception thresholds (SRT), and intelligibility slopes were determined, showing that the known beneficial effect of spatial release from masking (SRM) in improving speech intelligibility when the speaker is oriented at lateral azimuth angles relative to the listener, is preserved when using virtual speaker lateralization, and by the use of Internet transmitted audio using headphones. Results show that for azimuth angles of 20 and 30 degrees, left or right, speech intelligibility improves with an average unmasking benefit of 3.7 +/- 0.7 dB SRM, and an average intelligibility slope of 5.6 +/- 0.8%/dB, while also maintaining at these angles, a desirable impression of a virtual speaker with a reasonably frontal orientation. This provides a useful technique for improved Internet speech delivery. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:8
相关论文
共 1 条
  • [1] Binaural speech intelligibility tests conducted remotely over the Internet compared with tests under controlled laboratory conditions
    Padilla-Ortiz, A. L.
    Orduna-Bustamante, Felipe
    APPLIED ACOUSTICS, 2021, 172