Improvement on Speech Depression Recognition Based on Deep Networks

被引:0
|
作者
Li, Jinming [1 ]
Fu, Xiaoyan [2 ]
Shao, Zhuhong [3 ]
Shang, Yuanyuan [4 ]
机构
[1] Capital Normal Univ, Coll Informat Engn, Beijing, Peoples R China
[2] Capital Normal Univ, Beijing Key Lab Elect Syst Reliabil Technol, Beijing, Peoples R China
[3] Capital Normal Univ, Beijing Adv Innovat Ctr Imaging Technol, Beijing, Peoples R China
[4] Capital Normal Univ, Beijing Engn Res Ctr High Reliable Embedded Syst, Beijing, Peoples R China
来源
2018 CHINESE AUTOMATION CONGRESS (CAC) | 2018年
基金
中国国家自然科学基金;
关键词
automated depression diagnosis; speech processing; deep learning; feature extraction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To reduce the burden of clinicians diagnosing a large number of depressive symptoms, the field of artificial intelligence researchers are increasingly interested in designing automatic recognition systems for depression. Depressed patient have different speech signal from normal people. Here, we present a deep model, Depression AudioNet, which encodes depression-related features in the vocal tract and provides a more comprehensive audio representation. Firstly, the Mel-frequency cepstral coefficients (MFCCs) were extracted from raw audio data. Secondly, the robust emotions features were acquired by Multiscale Audio Delta Normalization (MADN), which is a data processing algorithm we proposed. Finally, the MFCCs and the emotions features of two adjacent segments of local audio were fed into the Depression AudioNet in turn to train the network. This method solves the problem of less training data and low precision by increasing the length information of the sample without reducing the number of samples. Experiments are conducted on AVEC2014 dataset, and the results shows that the proposed method is more effective and accurate than the existing speech depression recognition algorithms.
引用
收藏
页码:2705 / 2709
页数:5
相关论文
共 50 条
  • [41] Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks
    Rosario Campomanes-Alvarez, Blanca
    Quiros, Pelayo
    Fernandez, Bernardo
    APPLICATIONS OF INTELLIGENT SYSTEMS, 2018, 310 : 19 - 29
  • [42] Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review
    Lee, Wookey
    Seong, Jessica Jiwon
    Ozlu, Busra
    Shim, Bong Sup
    Marakhimov, Azizbek
    Lee, Suan
    SENSORS, 2021, 21 (04) : 1 - 22
  • [43] Deep Learning-Based Approach for Arabic Visual Speech Recognition
    Alsulami, Nadia H.
    Jamal, Amani T.
    Elrefaei, Lamiaa A.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 85 - 108
  • [44] Audio-visual speech recognition using deep learning
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
  • [45] Interference Quality Assessment of Speech Communication Based on Deep Learning
    Wang, Sen
    Lin, Yun
    Hao, Ming
    Xu, Huaitao
    Tian, Qiao
    IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (02) : 1011 - 1021
  • [46] Audio-visual speech recognition using deep learning
    Kuniaki Noda
    Yuki Yamaguchi
    Kazuhiro Nakadai
    Hiroshi G. Okuno
    Tetsuya Ogata
    Applied Intelligence, 2015, 42 : 722 - 737
  • [47] Application of Deep Learning and Speech Recognition Technology for Pedestrian Face Recognition in health sectors
    Shi S.
    Journal of Commercial Biotechnology, 2020, 25 (02) : 61 - 68
  • [48] Pattern recognition and features selection for speech emotion recognition model using deep learning
    Jermsittiparsert, Kittisak
    Abdurrahman, Abdurrahman
    Siriattakul, Parinya
    Sundeeva, Ludmila A.
    Hashim, Wahidah
    Rahim, Robbi
    Maseleno, Andino
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 799 - 806
  • [49] Pattern recognition and features selection for speech emotion recognition model using deep learning
    Kittisak Jermsittiparsert
    Abdurrahman Abdurrahman
    Parinya Siriattakul
    Ludmila A. Sundeeva
    Wahidah Hashim
    Robbi Rahim
    Andino Maseleno
    International Journal of Speech Technology, 2020, 23 : 799 - 806
  • [50] A Comprehensive Analysis of Speech Depression Recognition Systems
    Hassan, Ali
    Bernadin, Shonda
    SOUTHEASTCON 2024, 2024, : 1509 - 1518