VARIABILITY COMPENSATION IN SMALL DATA: OVERSAMPLED EXTRACTION OF I-VECTORS FOR THE CLASSIFICATION OF DEPRESSED SPEECH

被引:0
|
作者
Cummins, Nicholas [1 ,2 ]
Epps, Julien [1 ,2 ]
Sethu, Vidhyasaharan [1 ]
Krajewski, Jarek [3 ]
机构
[1] Univ New S Wales, Sch Elect Engn & Telecommun, Sydney, NSW, Australia
[2] Natl ICT Australia, ATP Res Lab, Sydney, NSW, Australia
[3] Univ Wuppertal, Expt Ind Psychol, Wuppertal, Germany
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
基金
澳大利亚研究理事会;
关键词
Depression; Acoustic Variability; I-vectors; Linear Discriminant Analysis; Within Class Covariance Normalisation; t-Distributed Stochastic Neighbour Embedding; RECOGNITION; SEVERITY;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Variations in the acoustic space due to changes in speaker mental state are potentially overshadowed by variability due to speaker identity and phonetic content. Using the Audio/Visual Emotion Challenge and Workshop 2013 Depression Dataset we explore the suitability of i-vectors for reducing these latter sources of variability for distinguishing between low or high levels of speaker depression. In addition we investigate whether supervised variability compensation methods such as Linear Discriminant Analysis (LDA), and Within Class Covariance Normalisation (WCCN), applied in the i-vector domain, could be used to compensate for speaker and phonetic variability. Classification results show that i-vectors formed using an over-sampling methodology outperform a baseline set by KL-means supervectors. However the effect of these two compensation methods does not appear to improve system accuracy. Visualisations afforded by the t-Distributed Stochastic Neighbour Embedding (t-SNE) technique suggest that despite the application of these techniques, speaker variability is still a strong confounding effect.
引用
收藏
页数:5
相关论文
共 13 条
  • [11] SPEAKER AGE ESTIMATION ON CONVERSATIONAL TELEPHONE SPEECH USING SENONE POSTERIOR BASED I-VECTORS
    Sadjadi, Seyed Omid
    Ganapathy, Sriram
    Pelecanos, Jason W.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5040 - 5044
  • [12] COMPARISON OF USER MODELS BASED ON GMM-UBM AND I-VECTORS FOR SPEECH, HANDWRITING, AND GAIT ASSESSMENT OF PARKINSON'S DISEASE PATIENTS
    Vasquez-Correa, J. C.
    Bocklet, T.
    Orozco-Arroyave, J. R.
    Noeth, E.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6544 - 6548
  • [13] Improving Deep Neural Networks Based Multi-Accent Mandarin Speech Recognition Using I-Vectors and Accent-Specific Top layer
    Chen, Mingming
    Yang, Zhanlei
    Liang, Jizhong
    Li, Yanpeng
    Liu, Wenju
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3620 - 3624