Acoustic compression in Zoom audio does not compromise voice recognition performance

被引:1
作者
Perepelytsia, Valeriia [1 ]
Dellwo, Volker [1 ]
机构
[1] Univ Zurich, Dept Computat Linguist, Andreasstr 15, CH-8050 Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
SPEAKER IDENTIFICATION; SPEECH; TELEPHONE; TRANSMISSION; EARWITNESSES; MEMORY; SIGNAL; IMPACT;
D O I
10.1038/s41598-023-45971-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Human voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old-new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions). Participants were familiarized with female voices presented in either studio-quality (N=22), Zoom-quality (N=21), or telephone-quality (N=20) stimuli. Subsequently, all listeners performed an identical voice recognition test containing a balanced stimulus set from all three conditions. Results revealed that voice recognition performance (d ') in Zoom audio was not significantly different to studio audio but both in Zoom and studio audio listeners performed significantly better compared to telephone audio. This suggests that signal processing of the speech codec used by Zoom provides equally relevant information in terms of voice recognition compared to studio audio. Interestingly, listeners familiarized with voices via Zoom audio showed a trend towards a better recognition performance in the test (p=0.056) compared to listeners familiarized with studio audio. We discuss future directions according to which a possible advantage of Zoom audio for voice recognition might be related to some of the speech coding mechanisms used by Zoom.
引用
收藏
页数:11
相关论文
共 63 条
[1]   The Glasgow Voice Memory Test: Assessing the ability to memorize and recognize unfamiliar voices [J].
Aglieri, Virginia ;
Watson, Rebecca ;
Pernet, Cyril ;
Latinus, Marianne ;
Garrido, Lucia ;
Belin, Pascal .
BEHAVIOR RESEARCH METHODS, 2017, 49 (01) :97-110
[2]   Impact of dynamic rate coding aspects of mobile phone networks on forensic voice comparison [J].
Alzqhoul, Esam A. S. ;
Nair, Balamurali B. T. ;
Guillemin, Bernard J. .
SCIENCE & JUSTICE, 2015, 55 (05) :363-374
[3]   Gorilla in our midst: An online behavioral experiment builder [J].
Anwyl-Irvine, Alexander L. ;
Massonnie, Jessica ;
Flitton, Adam ;
Kirkham, Natasha ;
Evershed, Jo K. .
BEHAVIOR RESEARCH METHODS, 2020, 52 (01) :388-407
[4]   Effect of face context on recognition memory for voices [J].
Armstrong, HA ;
McKelvie, SJ .
JOURNAL OF GENERAL PSYCHOLOGY, 1996, 123 (03) :259-270
[5]  
Bech S., 2006, Method and Application, DOI [10.1002/9780470869253, DOI 10.1002/9780470869253]
[6]   Localization and selection of speaker-specific information with statistical modeling [J].
Besacier, L ;
Bonastre, JF ;
Fredouille, C .
SPEECH COMMUNICATION, 2000, 31 (2-3) :89-106
[7]   The influence of signal complexity on speaker identification [J].
Betancourt, Kyna Sherman ;
Bahr, Ruth Huntley .
INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2010, 17 (02) :179-200
[8]  
Boersma P., 2023, Praat: Doing phonetics by computer (Version 6.3.08) Computer program
[9]   Identity From Variation: Representations of Faces Derived From Multiple Instances [J].
Burton, A. Mike ;
Kramer, Robin S. S. ;
Ritchie, Kay L. ;
Jenkins, Rob .
COGNITIVE SCIENCE, 2016, 40 (01) :202-223
[10]   The 'mobile phone effect' on vowel formants [J].
Byrne, Catherine ;
Foulkes, Paul .
INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2004, 11 (01) :83-102