Improvement on Speech Depression Recognition Based on Deep Networks

被引:0
|
作者
Li, Jinming [1 ]
Fu, Xiaoyan [2 ]
Shao, Zhuhong [3 ]
Shang, Yuanyuan [4 ]
机构
[1] Capital Normal Univ, Coll Informat Engn, Beijing, Peoples R China
[2] Capital Normal Univ, Beijing Key Lab Elect Syst Reliabil Technol, Beijing, Peoples R China
[3] Capital Normal Univ, Beijing Adv Innovat Ctr Imaging Technol, Beijing, Peoples R China
[4] Capital Normal Univ, Beijing Engn Res Ctr High Reliable Embedded Syst, Beijing, Peoples R China
来源
2018 CHINESE AUTOMATION CONGRESS (CAC) | 2018年
基金
中国国家自然科学基金;
关键词
automated depression diagnosis; speech processing; deep learning; feature extraction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To reduce the burden of clinicians diagnosing a large number of depressive symptoms, the field of artificial intelligence researchers are increasingly interested in designing automatic recognition systems for depression. Depressed patient have different speech signal from normal people. Here, we present a deep model, Depression AudioNet, which encodes depression-related features in the vocal tract and provides a more comprehensive audio representation. Firstly, the Mel-frequency cepstral coefficients (MFCCs) were extracted from raw audio data. Secondly, the robust emotions features were acquired by Multiscale Audio Delta Normalization (MADN), which is a data processing algorithm we proposed. Finally, the MFCCs and the emotions features of two adjacent segments of local audio were fed into the Depression AudioNet in turn to train the network. This method solves the problem of less training data and low precision by increasing the length information of the sample without reducing the number of samples. Experiments are conducted on AVEC2014 dataset, and the results shows that the proposed method is more effective and accurate than the existing speech depression recognition algorithms.
引用
收藏
页码:2705 / 2709
页数:5
相关论文
共 50 条
  • [1] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Niu, Yafeng
    Zou, Dongsheng
    Niu, Yadong
    He, Zhongshi
    Tan, Hua
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
  • [2] Noise Robust Speech Recognition Using Deep Belief Networks
    Farahat, Mahboubeh
    Halavati, Ramin
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2016, 15 (01)
  • [3] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [4] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [5] SYNAPTIC DEPRESSION IN DEEP NEURAL NETWORKS FOR SPEECH PROCESSING
    Zhang, Wenhao
    Li, Hanyu
    Yang, Minda
    Mesgarani, Nima
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5865 - 5869
  • [6] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [7] Automated speech-based screening of depression using deep convolutional neural networks
    Chlasta, Karol
    Wolk, Krzysztof
    Krejtz, Izabela
    CENTERIS2019--INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/PROJMAN2019--INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/HCIST2019--INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, 164 : 618 - 628
  • [8] Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients
    Huang Y.
    Wu A.
    Zhang G.
    Li Y.
    1600, UK Simulation Society, Clifton Lane, Nottingham, NG11 8NS, United Kingdom (17): : 28.1 - 28.5
  • [9] Comparative Analysis of Deep Recurrent Neural Networks for Speech Recognition
    Atosha, Pascal Bahavu
    Ozbilge, Emre
    Kirsal, Yonal
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [10] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403