Eliciting and evaluating likelihood ratios for speaker recognition by human listeners under forensically realistic channel-mismatched conditions

被引：2

作者：

Hughes, Vincent ^{[1
]}

Llamas, Carmen ^{[1
]}

Kettig, Thomas ^{[1
]}

机构：

[1] Univ York, Dept Language & Linguist Sci, York, N Yorkshire, England

来源：

INTERSPEECH 2022 | 2022年

基金：

英国艺术与人文研究理事会;

关键词：

speaker recognition; forensic voice comparison; human listeners; likelihood ratio; validation; DISCRIMINATION;

D O I：

10.21437/Interspeech.2022-490

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes an experiment which elicits and then evaluates LR-like scores from non-expert, human listeners in a speaker recognition task under conditions reflective of forensic casework. In doing so, it provides a framework for comparing and combining listener judgements with the output of ASR systems (or other data-driven speaker recognition approaches). Stimuli consisted of 45 same-speaker and 45 different-speaker pairs of voices from young, male speakers of Standard Southern British English, using 10 second, channel-mismatched samples. 81 listeners provided ratings of the similarity between voices and their typicality within the wider accent population, which in turn were used to calculated LR-like scores. These scores were converted to log LRs via cross-validated logistic regression calibration. Overall, the human listeners produced an EER of 26.67% and a C-llr of 0.773. However, considerable variation was found across individual listeners (13.3-66.7% EER). Fusion of the listener judgements with an x-vector ASR system provided very marginal improvement in performance compared with the ASR system in isolation. Importantly, the magnitude of the four errors made by the ASR system were reduced because of the listener judgements. The implications of this work for forensics will be discussed.

引用

页码：5238 / 5242

页数：5