Smart voice recognition based on deep learning for depression diagnosis

被引:0
|
作者
Sukit Suparatpinyo
Nuanwan Soonthornphisaj
机构
[1] Kasetsart University,Department of Computer Science, Faculty of Science
来源
Artificial Life and Robotics | 2023年 / 28卷
关键词
Deep residual network; Spectrograph; Depression; Recognition; Audio file;
D O I
暂无
中图分类号
学科分类号
摘要
Depressive disorder is a kind of mental illness with a high incidence rate due to the stress from the environment or social impact. Depression affects mood and behavior that leads to various problem domains such as education, family, and workplace problems. Suicide attempt is found in severe depression cases as well. However, depression is a treatable condition if diagnosed by psychiatrists. In Thailand, many people who aware of mental disorders do not seek help from psychiatric hospitals due to long waiting services and high fees. Therefore, we aim to create an application for users to do self-assessment by collecting their voice signal data. In our experiment, we define the voice data obtained from the depressive patient during a therapy session in a psychiatric hospital as positive class. The negative class is the voice data of non-depressive people obtained from the interview session with university students. Each audio file has been rendered into spectrograph. The spectrograph is a visual representation of power spectrum. A power spectrum is the Mel frequency-spaced cepstral coefficients (MFCCs) extracted from the human voice that changes over time using fast Fourier transform and discrete cosine transform (DCT) algorithms. Since some research claimed that DCT causes some spectral features to be loss, we do empirical studies between applied DCT and non- DCT spectrographs set. Moreover some research studies stated that larger window provides more detail of speech activity on power spectrum which affected to the performance of depressive detection, so we explore Blackman-Harris and Blackman window functions to create different set of spectrographs to prove that idea on Thai speech dataset. Deep learning models based on the deep residual network (ResNet) are explored to see its potential on classification. Different numbers of convolution layers such as ResNet-34, ResNet-50, and ResNet-101 are examined, respectively. The experimental results show that both trained ResNet-50 model from different type of spectrograph can achieve higher than 70% of F1-Score which is the best performance above other approaches. We found that the model learning from spectrograph extracted by Blackman window function with non-DCT algorithm provides the best sensitivity at 74.45% showing. To the best of our knowledge, our approach gives the highest F1-score when compared to the state of the art methods.
引用
收藏
页码:332 / 342
页数:10
相关论文
共 50 条
  • [1] Smart voice recognition based on deep learning for depression diagnosis
    Suparatpinyo, Sukit
    Soonthornphisaj, Nuanwan
    ARTIFICIAL LIFE AND ROBOTICS, 2023, 28 (02) : 332 - 342
  • [2] Voice Recognition Based on Adaptive MFCC and Deep Learning
    Bae, Hyan-Soo
    Lee, Ho-Jin
    Lee, Suk-Gyu
    PROCEEDINGS OF THE 2016 IEEE 11TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2016, : 1542 - 1546
  • [3] Deep Learning Based Face Recognition System with Smart Glasses
    Daescu, Ovidiu
    Huang, Hongyao
    Weinzierl, Maxwell
    12TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2019), 2019, : 218 - 226
  • [4] Voice Gender Recognition Using Deep Learning
    Buyukyilmaz, Mucahit
    Cibikdiken, Ali Osman
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON MODELING, SIMULATION AND OPTIMIZATION TECHNOLOGIES AND APPLICATIONS (MSOTA2016), 2016, 58 : 409 - 411
  • [5] Deep Learning Based License Plate Number Recognition for Smart Cities
    Vetriselvi, T.
    Lydia, E. Laxmi
    Mohanty, Sachi Nandan
    Alabdulkreem, Eatedal
    Al-Otaibi, Shaha
    Al-Rasheed, Amal
    Mansour, Romany F.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2049 - 2064
  • [6] Deep learning based smart radar vision system for object recognition
    Wen, Zhigang
    Liu, Dan
    Liu, Xiaoqing
    Zhong, Ling
    Lv, You
    Jia, Yinglin
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (03) : 829 - 839
  • [7] Deep learning based smart radar vision system for object recognition
    Zhigang Wen
    Dan Liu
    Xiaoqing Liu
    Ling Zhong
    You Lv
    Yinglin Jia
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 829 - 839
  • [8] A Voice-Based Emotion Recognition System Using Deep Learning Techniques
    Pantoja, Carlos Guerron
    Maya-Olalla, Edgar
    Dominguez-Limaico, Hernan M.
    Zambrano, Marcelo
    Ayala, Carlos Vasquez
    Pasquel, Marco Gordillo
    INNOVATION AND RESEARCH-SMART TECHNOLOGIES & SYSTEMS, VOL 1, CI3 2023, 2024, 1040 : 155 - 172
  • [9] Torsional nystagmus recognition based on deep learning for vertigo diagnosis
    Li, Haibo
    Yang, Zhifan
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [10] Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment
    Ivleva, Natalja
    Pentel, Avar
    Dunajeva, Olga
    Justsenko, Valeria
    TOWARDS A HYBRID, FLEXIBLE AND SOCIALLY ENGAGED HIGHER EDUCATION, VOL 1, ICL 2023, 2024, 899 : 420 - 431