Depression Speech Recognition With a Three-Dimensional Convolutional Network

被引:15
作者
Wang, Hongbo [1 ]
Liu, Yu [1 ]
Zhen, Xiaoxiao [1 ]
Tu, Xuyan [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing Key Lab Knowledge Engn Mat Sci, Beijing, Peoples R China
来源
FRONTIERS IN HUMAN NEUROSCIENCE | 2021年 / 15卷
基金
中国国家自然科学基金;
关键词
depression detection; speech emotion recognition; multi-channel convolution; attention mechanism; deep learning; EMOTION; INDICATORS; SEVERITY;
D O I
10.3389/fnhum.2021.713823
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early diagnosis and treatment of patients. Because the abnormal speech features of patients with depression are related to their mental state to some extent, it is valuable to use speech acoustic features as objective indicators for the diagnosis of depression. In order to solve the problem of the complexity of speech in depression and the limited performance of traditional feature extraction methods for speech signals, this article suggests a Three-Dimensional Convolutional filter bank with Highway Networks and Bidirectional GRU (Gated Recurrent Unit) with an Attention mechanism (in short 3D-CBHGA), which includes two key strategies. (1) The three-dimensional feature extraction of the speech signal can timely realize the expression ability of those depression signals. (2) Based on the attention mechanism in the GRU network, the frame-level vector is weighted to get the hidden emotion vector by self-learning. Experiments show that the proposed 3D-CBHGA can well establish mapping from speech signals to depression-related features and improve the accuracy of depression detection in speech signals.
引用
收藏
页数:15
相关论文
共 46 条
[1]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[2]  
Basu S., 2017, P 2 INT C COMM EL SY, P333, DOI 10.1109/CESYS.2017.8321292
[3]  
Bertero D, 2017, INT CONF ACOUST SPEE, P5115, DOI 10.1109/ICASSP.2017.7953131
[4]  
Bradbury James, 2016, P 5 INT C LEARNING R
[5]   Voice acoustical measurement of the severity of major depression [J].
Cannizzaro, M ;
Harel, B ;
Reilly, N ;
Chappell, P ;
Snyder, PJ .
BRAIN AND COGNITION, 2004, 56 (01) :30-35
[6]  
Chan W, 2015, INT CONF ACOUST SPEE, P2056, DOI 10.1109/ICASSP.2015.7178332
[7]  
Chao LL, 2015, INT CONF AFFECT, P526, DOI 10.1109/ACII.2015.7344620
[8]  
Cho K., 2014, Computation and Language (cs.CL), P103, DOI [10.3115/v1/W14-4012, DOI 10.3115/V1/W14-4012, 10.3115/v1/w14-4012]
[9]  
Chorowski J, 2015, ADV NEUR IN, V28
[10]   Acoustical properties of speech as indicators of depression and suicidal risk [J].
France, DJ ;
Shiavi, RG ;
Silverman, S ;
Silverman, M ;
Wilkes, DM .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2000, 47 (07) :829-837