Emulating Perceptual Evaluation of Voice Using Scattering Transform Based Features

被引:2
作者
Miramont, Juan Manuel [1 ]
Colominas, Marcelo Alejandro [1 ]
Schlotthauer, Gaston [1 ]
机构
[1] UNER CONICET, Inst Res & Dev Bioengn & Bioinformat IBB, RA-3100 Oro Verde, Entre Rios, Argentina
关键词
Task analysis; Protocols; Scattering; Perturbation methods; Transforms; Speech processing; Feature extraction; Total variation; scattering transform; support vector machines; voice quality; voice typing; FREQUENCY; QUALITY; RECOGNITION; DESIGN; MODELS; STATE;
D O I
10.1109/TASLP.2022.3178239
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice health is traditionally assessed by methods that rely on the perception of a clinician, who integrates auditory and visual cues in order to reach a conclusion about the voice under evaluation. However, these tasks suffer from inter-professional variability due to its subjective nature, which is why more objective, computational-based methods are of interest. Two examples of such subjective tasks are the classification of voices in three types according to their periodicity, also termed voice typing, and the evaluation of six aspects of voice quality by means of the consensus auditory-perceptual evaluation of voice (CAPE-V) protocol. In this paper, two approaches to emulate each of those tasks are introduced, based on simple features extracted from scattering transform coefficients and support vector machines. Firstly, a system for automatic voice typing was trained and its classification performance was evaluated for intra and inter-dataset trials using two widely known corpora. Accuracies above 80%, comparable to the state-of-the-art, were found for all the experiments conducted. Secondly, a multidimensional, multioutput regression chain model was used to automatically grade the voice quality features of the CAPE-V protocol, obtaining errors and correlation coefficients that are comparable to those found for three human raters.
引用
收藏
页码:1892 / 1901
页数:10
相关论文
共 50 条
  • [1] ROOT CEPSTRAL ANALYSIS - A UNIFIED VIEW - APPLICATION TO SPEECH PROCESSING IN CAR NOISE ENVIRONMENTS
    ALEXANDRE, P
    LOCKWOOD, P
    [J]. SPEECH COMMUNICATION, 1993, 12 (03) : 277 - 288
  • [2] Deep Scattering Spectrum
    Anden, Joakim
    Mallat, Stephane
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) : 4114 - 4128
  • [3] Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the GRB Scale
    Arias-Londono, Julian D.
    Gomez-Garcia, Jorge A.
    Godino-Llorente, Juan, I
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) : 413 - 422
  • [4] Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients
    Arias-Londono, Julian D.
    Godino-Llorente, Juan I.
    Saenz-Lechon, Nicolas
    Osma-Ruiz, Victor
    Castellanos-Dominguez, German
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2011, 58 (02) : 370 - 379
  • [5] Aronson A., 1990, Clinical voice disorders: An interdisciplinary approach
  • [6] Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V
    Awan, Shaheen N.
    Roy, Nelson
    Jette, Marie E.
    Meltzner, Geoffrey S.
    Hillman, Robert E.
    [J]. CLINICAL LINGUISTICS & PHONETICS, 2010, 24 (09) : 742 - 758
  • [7] Baken R., 2000, CLIN MEASUREMENT SPE
  • [8] The Evaluation of Voice Quality via Signal Typing in Voice using Narrowband Spectrograms
    Barsties, B.
    Hoffmann, U.
    Maryn, Y.
    [J]. LARYNGO-RHINO-OTOLOGIE, 2016, 95 (02) : 105 - 111
  • [9] Assessment of voice quality: Current state-of-the-art
    Barsties, Ben
    De Bodt, Marc
    [J]. AURIS NASUS LARYNX, 2015, 42 (03) : 183 - 188
  • [10] Microphone and electroglottographic data from dysphonic patients: Type 1, 2 and 3 signals
    Behrman, A
    Agresti, CJ
    Blumstein, E
    Lee, N
    [J]. JOURNAL OF VOICE, 1998, 12 (02) : 249 - 260