Depression Speech Recognition With a Three-Dimensional Convolutional Network

被引:15
作者
Wang, Hongbo [1 ]
Liu, Yu [1 ]
Zhen, Xiaoxiao [1 ]
Tu, Xuyan [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing Key Lab Knowledge Engn Mat Sci, Beijing, Peoples R China
来源
FRONTIERS IN HUMAN NEUROSCIENCE | 2021年 / 15卷
基金
中国国家自然科学基金;
关键词
depression detection; speech emotion recognition; multi-channel convolution; attention mechanism; deep learning; EMOTION; INDICATORS; SEVERITY;
D O I
10.3389/fnhum.2021.713823
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early diagnosis and treatment of patients. Because the abnormal speech features of patients with depression are related to their mental state to some extent, it is valuable to use speech acoustic features as objective indicators for the diagnosis of depression. In order to solve the problem of the complexity of speech in depression and the limited performance of traditional feature extraction methods for speech signals, this article suggests a Three-Dimensional Convolutional filter bank with Highway Networks and Bidirectional GRU (Gated Recurrent Unit) with an Attention mechanism (in short 3D-CBHGA), which includes two key strategies. (1) The three-dimensional feature extraction of the speech signal can timely realize the expression ability of those depression signals. (2) Based on the attention mechanism in the GRU network, the frame-level vector is weighted to get the hidden emotion vector by self-learning. Experiments show that the proposed 3D-CBHGA can well establish mapping from speech signals to depression-related features and improve the accuracy of depression detection in speech signals.
引用
收藏
页数:15
相关论文
共 46 条
[41]  
Williamson JR., 2013, P 3 ACM INT WORKSHOP, P41, DOI [DOI 10.1145/2512530.2512531, 10.1145/2512530.2512531]
[42]  
World Health Organization, 2020, DEPR OTH COMM MENT D
[43]  
Xu H., 2018, CHIN EVID BASED NURS, V4, P679, DOI [10.12102/j.issn.2095-8668.2018.08.002, DOI 10.12102/J.ISSN.2095-8668.2018.08.002]
[44]   Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification [J].
Yun, Sungrack ;
Yoo, Chang D. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :585-598
[45]  
Zaidan Noor Aina, 2016, Advances in Machine Learning and Signal Processing, MALSIP 2015. Proceedings: LNEE 387, P141, DOI 10.1007/978-3-319-32213-1_13
[46]  
Zhang XW, 2020, IEEE ENG MED BIO, P128, DOI 10.1109/EMBC44109.2020.9175956