Improvement on Speech Depression Recognition Based on Deep Networks

被引:0
|
作者
Li, Jinming [1 ]
Fu, Xiaoyan [2 ]
Shao, Zhuhong [3 ]
Shang, Yuanyuan [4 ]
机构
[1] Capital Normal Univ, Coll Informat Engn, Beijing, Peoples R China
[2] Capital Normal Univ, Beijing Key Lab Elect Syst Reliabil Technol, Beijing, Peoples R China
[3] Capital Normal Univ, Beijing Adv Innovat Ctr Imaging Technol, Beijing, Peoples R China
[4] Capital Normal Univ, Beijing Engn Res Ctr High Reliable Embedded Syst, Beijing, Peoples R China
来源
2018 CHINESE AUTOMATION CONGRESS (CAC) | 2018年
基金
中国国家自然科学基金;
关键词
automated depression diagnosis; speech processing; deep learning; feature extraction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To reduce the burden of clinicians diagnosing a large number of depressive symptoms, the field of artificial intelligence researchers are increasingly interested in designing automatic recognition systems for depression. Depressed patient have different speech signal from normal people. Here, we present a deep model, Depression AudioNet, which encodes depression-related features in the vocal tract and provides a more comprehensive audio representation. Firstly, the Mel-frequency cepstral coefficients (MFCCs) were extracted from raw audio data. Secondly, the robust emotions features were acquired by Multiscale Audio Delta Normalization (MADN), which is a data processing algorithm we proposed. Finally, the MFCCs and the emotions features of two adjacent segments of local audio were fed into the Depression AudioNet in turn to train the network. This method solves the problem of less training data and low precision by increasing the length information of the sample without reducing the number of samples. Experiments are conducted on AVEC2014 dataset, and the results shows that the proposed method is more effective and accurate than the existing speech depression recognition algorithms.
引用
收藏
页码:2705 / 2709
页数:5
相关论文
共 50 条
  • [21] Bidirectional deep architecture for Arabic speech recognition
    Zerari, Naima
    Abdelhamid, Samir
    Bouzgou, Hassen
    Raymond, Christian
    OPEN COMPUTER SCIENCE, 2019, 9 (01): : 92 - 102
  • [22] Multilingual Depression Detection Based on Speech Signals and Deep Leaming
    Liu, Lidan
    Tydeman, Florence
    Xie, Wangqing
    Wang, Yanzhong
    2024 IEEE 10TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND MACHINE LEARNING APPLICATIONS, BIGDATASERVICE 2024, 2024, : 115 - 116
  • [23] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846
  • [24] An Acoustic Model For English Speech Recognition Based On Deep Learning
    Ling, Zhang
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
  • [25] Improving Deep Learning based Automatic Speech Recognition for Gujarati
    Raval, Deepang
    Pathak, Vyom
    Patel, Muktan
    Bhatt, Brijesh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [26] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [27] Deep Convolution Neural Network Based Speech Recognition for Chhattisgarhi
    Londhe, Narendra D.
    Kshirsagar, Ghanahshyam B.
    Tekchandani, Hitesh
    2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 667 - 671
  • [28] SPEECH RECOGNITION FEATURES BASED ON DEEP LATENT GAUSSIAN MODELS
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [29] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
    Tzirakis, Panagiotis
    Zhang, Jiehao
    Schuller, Bjoern W.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
  • [30] An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 87 - 96