Sub-band parametric cepstral distance measurement of voiceless alveolar fricative segments as a tool for identifying speaker-characteristic information robust to emotional variation

被引：0

作者：

Keith, Emma ^{[1
]}

Kinoshita, Yuko ^{[1
]}

机构：

[1] Australian Natl Univ, Canberra, Australia

来源：

INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW | 2024年 / 31卷 / 02期

关键词：

Forensic Voice Comparison; speaker recognition; emotion; cepstrum; parametric cepstral distance; fricative; FEATURES; FORMANTS; SPEECH; NOISE;

D O I：

10.3138/ijsll-2024-0035

中图分类号：

DF [法律]; D9 [法律];

学科分类号：

0301 ;

摘要：

Existing methods of identifying speaker-specific acoustic information in Forensic Voice Comparison (FVC), such as formant-based methods, suffer in performance when dealing with variation in speaker emotion. This is especially the case when it comes to the extremities of emotionally conditioned speech variation which often appear in FVC casework. Therefore, it is worthwhile to ask the question whether there exist particular aspects of speech data which are indicative of speaker-specific characteristics, but which remain robust to variation in emotional state. We present our investigation into the potential utility of the segment /s/ for this purpose, using the Parametric Cepstral Distance (PCD) method proposed by Clermont and Mokhtari (1994). We find that the cepstrum seems to contain speaker-specific information around 2-3kHz, a range which also demonstrates strikingly little emotionally-conditioned variation. We therefore suggest that the spectral characteristics of /s/ could be useful for FVC, where analysts are often required to compare emotionally varied recordings. The segment /i:/ is also used in order to ensure the reliability of the emotion coding groups established.

引用

页码：267 / 290

页数：24

共 63 条

[1] Alzqhoul E. A. S., 2014, 15th Australasian International Conference on Speech Science Technology
[2] Speaker characteristics and emotion classification
Mustererkennung, Universität Erlangen-Nürnberg, Martensstr. 3, 91058 Erlangen, Germany
不详
[J]. Lect. Notes Comput. Sci., 2007, (138-151): : 138 - 151
[3] ON THE ROLE OF THE AMPLITUDE OF THE FRICATIVE NOISE IN THE PERCEPTION OF PLACE OF ARTICULATION IN VOICELESS FRICATIVE CONSONANTS
BEHRENS, S
BLUMSTEIN, SE
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 84 (03) : 861 - 867
[4] Class-level spectral features for emotion recognition
Bitouk, Dmitri
Verma, Ragini
Nenkova, Ani
[J]. SPEECH COMMUNICATION, 2010, 52 (7-8) : 613 - 625
[5] Boersma, 2023, PRAAT DOING PHONETIC
[6] Cahn J., 1983, Generating expression in synthesized speech
[7] Cao H., 2019, 19 INT C PHON SCI, P617
[8] Clermont F., 2022, P 18 AUSTR INT C SPE, P136
[9] Clermont F., 1994, Proceedings of the Vth Australian International Conference on Speech Science and Technology, V1, P354
[10] Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers
Daqrouq, Khaled
Tutunji, Tarek A.
[J]. APPLIED SOFT COMPUTING, 2015, 27 : 231 - 239

← 1 2 3 4 5 6 7 →