Improvement on Speech Depression Recognition Based on Deep Networks

被引：0

作者：

Li, Jinming ^{[1
]}

Fu, Xiaoyan ^{[2
]}

Shao, Zhuhong ^{[3
]}

Shang, Yuanyuan ^{[4
]}

机构：

[1] Capital Normal Univ, Coll Informat Engn, Beijing, Peoples R China

[2] Capital Normal Univ, Beijing Key Lab Elect Syst Reliabil Technol, Beijing, Peoples R China

[3] Capital Normal Univ, Beijing Adv Innovat Ctr Imaging Technol, Beijing, Peoples R China

[4] Capital Normal Univ, Beijing Engn Res Ctr High Reliable Embedded Syst, Beijing, Peoples R China

来源：

2018 CHINESE AUTOMATION CONGRESS (CAC) | 2018年

基金：

中国国家自然科学基金;

关键词：

automated depression diagnosis; speech processing; deep learning; feature extraction;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To reduce the burden of clinicians diagnosing a large number of depressive symptoms, the field of artificial intelligence researchers are increasingly interested in designing automatic recognition systems for depression. Depressed patient have different speech signal from normal people. Here, we present a deep model, Depression AudioNet, which encodes depression-related features in the vocal tract and provides a more comprehensive audio representation. Firstly, the Mel-frequency cepstral coefficients (MFCCs) were extracted from raw audio data. Secondly, the robust emotions features were acquired by Multiscale Audio Delta Normalization (MADN), which is a data processing algorithm we proposed. Finally, the MFCCs and the emotions features of two adjacent segments of local audio were fed into the Depression AudioNet in turn to train the network. This method solves the problem of less training data and low precision by increasing the length information of the sample without reducing the number of samples. Experiments are conducted on AVEC2014 dataset, and the results shows that the proposed method is more effective and accurate than the existing speech depression recognition algorithms.

引用

页码：2705 / 2709

页数：5

共 50 条

[21] Bidirectional deep architecture for Arabic speech recognition
Zerari, Naima
Abdelhamid, Samir
Bouzgou, Hassen
Raymond, Christian
OPEN COMPUTER SCIENCE, 2019, 9 (01): : 92 - 102
[22] Multilingual Depression Detection Based on Speech Signals and Deep Leaming
Liu, Lidan
Tydeman, Florence
Xie, Wangqing
Wang, Yanzhong
2024 IEEE 10TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND MACHINE LEARNING APPLICATIONS, BIGDATASERVICE 2024, 2024, : 115 - 116
[23] Deep learning based Affective Model for Speech Emotion Recognition
Zhou, Xi
Guo, Junqi
Bie, Rongfang
2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846
[24] An Acoustic Model For English Speech Recognition Based On Deep Learning
Ling, Zhang
2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
[25] Improving Deep Learning based Automatic Speech Recognition for Gujarati
Raval, Deepang
Pathak, Vyom
Patel, Muktan
Bhatt, Brijesh
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
[26] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
Liu, Gang
He, Wei
Jin, Bicheng
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
[27] Deep Convolution Neural Network Based Speech Recognition for Chhattisgarhi
Londhe, Narendra D.
Kshirsagar, Ghanahshyam B.
Tekchandani, Hitesh
2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 667 - 671
[28] SPEECH RECOGNITION FEATURES BASED ON DEEP LATENT GAUSSIAN MODELS
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
[29] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
Tzirakis, Panagiotis
Zhang, Jiehao
Schuller, Bjoern W.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
[30] An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
de-la-Calle-Silos, F.
Gallardo-Antolin, A.
Pelaez-Moreno, C.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 87 - 96

← 1 2 3 4 5 →