Improvement on Speech Depression Recognition Based on Deep Networks

被引：0

作者：

Li, Jinming ^{[1
]}

Fu, Xiaoyan ^{[2
]}

Shao, Zhuhong ^{[3
]}

Shang, Yuanyuan ^{[4
]}

机构：

[1] Capital Normal Univ, Coll Informat Engn, Beijing, Peoples R China

[2] Capital Normal Univ, Beijing Key Lab Elect Syst Reliabil Technol, Beijing, Peoples R China

[3] Capital Normal Univ, Beijing Adv Innovat Ctr Imaging Technol, Beijing, Peoples R China

[4] Capital Normal Univ, Beijing Engn Res Ctr High Reliable Embedded Syst, Beijing, Peoples R China

来源：

2018 CHINESE AUTOMATION CONGRESS (CAC) | 2018年

基金：

中国国家自然科学基金;

关键词：

automated depression diagnosis; speech processing; deep learning; feature extraction;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To reduce the burden of clinicians diagnosing a large number of depressive symptoms, the field of artificial intelligence researchers are increasingly interested in designing automatic recognition systems for depression. Depressed patient have different speech signal from normal people. Here, we present a deep model, Depression AudioNet, which encodes depression-related features in the vocal tract and provides a more comprehensive audio representation. Firstly, the Mel-frequency cepstral coefficients (MFCCs) were extracted from raw audio data. Secondly, the robust emotions features were acquired by Multiscale Audio Delta Normalization (MADN), which is a data processing algorithm we proposed. Finally, the MFCCs and the emotions features of two adjacent segments of local audio were fed into the Depression AudioNet in turn to train the network. This method solves the problem of less training data and low precision by increasing the length information of the sample without reducing the number of samples. Experiments are conducted on AVEC2014 dataset, and the results shows that the proposed method is more effective and accurate than the existing speech depression recognition algorithms.

引用

页码：2705 / 2709

页数：5

共 50 条

[1] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Niu, Yafeng
Zou, Dongsheng
Niu, Yadong
He, Zhongshi
Tan, Hua
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
[2] Noise Robust Speech Recognition Using Deep Belief Networks
Farahat, Mahboubeh
Halavati, Ramin
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2016, 15 (01)
[3] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
Espana-Bonet, Cristina
Fonollosa, Jose A. R.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
[4] Deep Neural Networks in Russian Speech Recognition
Markovnikov, Nikita
Kipyatkova, Irina
Karpov, Alexey
Filchenkov, Andrey
ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
[5] SYNAPTIC DEPRESSION IN DEEP NEURAL NETWORKS FOR SPEECH PROCESSING
Zhang, Wenhao
Li, Hanyu
Yang, Minda
Mesgarani, Nima
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5865 - 5869
[6] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[7] Automated speech-based screening of depression using deep convolutional neural networks
Chlasta, Karol
Wolk, Krzysztof
Krejtz, Izabela
CENTERIS2019--INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/PROJMAN2019--INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/HCIST2019--INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, 164 : 618 - 628
[8] Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients
Huang Y.
Wu A.
Zhang G.
Li Y.
1600, UK Simulation Society, Clifton Lane, Nottingham, NG11 8NS, United Kingdom (17): : 28.1 - 28.5
[9] Comparative Analysis of Deep Recurrent Neural Networks for Speech Recognition
Atosha, Pascal Bahavu
Ozbilge, Emre
Kirsal, Yonal
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[10] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
Miao, Yajie
Metze, Florian
Rawat, Shourabh
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403

← 1 2 3 4 5 →