Sub-band parametric cepstral distance measurement of voiceless alveolar fricative segments as a tool for identifying speaker-characteristic information robust to emotional variation

被引:0
作者
Keith, Emma [1 ]
Kinoshita, Yuko [1 ]
机构
[1] Australian Natl Univ, Canberra, Australia
关键词
Forensic Voice Comparison; speaker recognition; emotion; cepstrum; parametric cepstral distance; fricative; FEATURES; FORMANTS; SPEECH; NOISE;
D O I
10.3138/ijsll-2024-0035
中图分类号
DF [法律]; D9 [法律];
学科分类号
0301 ;
摘要
Existing methods of identifying speaker-specific acoustic information in Forensic Voice Comparison (FVC), such as formant-based methods, suffer in performance when dealing with variation in speaker emotion. This is especially the case when it comes to the extremities of emotionally conditioned speech variation which often appear in FVC casework. Therefore, it is worthwhile to ask the question whether there exist particular aspects of speech data which are indicative of speaker-specific characteristics, but which remain robust to variation in emotional state. We present our investigation into the potential utility of the segment /s/ for this purpose, using the Parametric Cepstral Distance (PCD) method proposed by Clermont and Mokhtari (1994). We find that the cepstrum seems to contain speaker-specific information around 2-3kHz, a range which also demonstrates strikingly little emotionally-conditioned variation. We therefore suggest that the spectral characteristics of /s/ could be useful for FVC, where analysts are often required to compare emotionally varied recordings. The segment /i:/ is also used in order to ensure the reliability of the emotion coding groups established.
引用
收藏
页码:267 / 290
页数:24
相关论文
共 63 条
  • [1] Alzqhoul E. A. S., 2014, 15th Australasian International Conference on Speech Science Technology
  • [2] Speaker characteristics and emotion classification
    Mustererkennung, Universität Erlangen-Nürnberg, Martensstr. 3, 91058 Erlangen, Germany
    不详
    [J]. Lect. Notes Comput. Sci., 2007, (138-151): : 138 - 151
  • [3] ON THE ROLE OF THE AMPLITUDE OF THE FRICATIVE NOISE IN THE PERCEPTION OF PLACE OF ARTICULATION IN VOICELESS FRICATIVE CONSONANTS
    BEHRENS, S
    BLUMSTEIN, SE
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 84 (03) : 861 - 867
  • [4] Class-level spectral features for emotion recognition
    Bitouk, Dmitri
    Verma, Ragini
    Nenkova, Ani
    [J]. SPEECH COMMUNICATION, 2010, 52 (7-8) : 613 - 625
  • [5] Boersma, 2023, PRAAT DOING PHONETIC
  • [6] Cahn J., 1983, Generating expression in synthesized speech
  • [7] Cao H., 2019, 19 INT C PHON SCI, P617
  • [8] Clermont F., 2022, P 18 AUSTR INT C SPE, P136
  • [9] Clermont F., 1994, Proceedings of the Vth Australian International Conference on Speech Science and Technology, V1, P354
  • [10] Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers
    Daqrouq, Khaled
    Tutunji, Tarek A.
    [J]. APPLIED SOFT COMPUTING, 2015, 27 : 231 - 239