Using SincNet for Learning Pathological Voice Disorders

被引:9
|
作者
Hung, Chao-Hsiang [1 ]
Wang, Syu-Siang [1 ]
Wang, Chi-Te [2 ]
Fang, Shih-Hau [1 ]
机构
[1] Yuan Ze Univ, Dept Elect Engn, Taoyuan 320, Taiwan
[2] Far Eastern Mem Hosp, Dept Otolaryngol Head & Neck Surg, New Taipei 220, Taiwan
关键词
pathological voice; classification; sinc functions; convolutional neural network; SincNet;
D O I
10.3390/s22176634
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Deep learning techniques such as convolutional neural networks (CNN) have been successfully applied to identify pathological voices. However, the major disadvantage of using these advanced models is the lack of interpretability in explaining the predicted outcomes. This drawback further introduces a bottleneck for promoting the classification or detection of voice-disorder systems, especially in this pandemic period. In this paper, we proposed using a series of learnable sinc functions to replace the very first layer of a commonly used CNN to develop an explainable SincNet system for classifying or detecting pathological voices. The applied sinc filters, a front-end signal processor in SincNet, are critical for constructing the meaningful layer and are directly used to extract the acoustic features for following networks to generate high-level voice information. We conducted our tests on three different Far Eastern Memorial Hospital voice datasets. From our evaluations, the proposed approach achieves the highest 7%-accuracy and 9%-sensitivity improvements from conventional methods and thus demonstrates superior performance in predicting input pathological waveforms of the SincNet system. More importantly, we intended to give possible explanations between the system output and the first-layer extracted speech features based on our evaluated results.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] VOICE DISORDERS
    SATALOFF, RT
    SPIEGEL, JR
    HAWKSHAW, M
    MEDICAL CLINICS OF NORTH AMERICA, 1993, 77 (03) : 551 - 570
  • [32] Perception of aperiodicity in pathological voice
    Kreiman, J
    Gerratt, BR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 117 (04): : 2201 - 2211
  • [33] ACOUSTICAL EXAMINATION OF PATHOLOGICAL VOICE
    ZALESSKAKRECICKA, M
    FOLIA PHONIATRICA, 1989, 41 (4-5): : 235 - 235
  • [34] Source Analysis of Pathological Voice
    Jo, Cheolwoo
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 1271 - 1274
  • [35] Voice spoofing countermeasure for voice replay attacks using deep learning
    Zhou, Jincheng
    Hai, Tao
    Jawawi, Dayang N. A.
    Wang, Dan
    Ibeke, Ebuka
    Biamba, Cresantus
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01):
  • [36] Voice Disorders Classification Using Multilayer Neural Network
    Salhi, Lotfi
    Mourad, Talbi
    Cherif, Adnene
    SCS: 2008 2ND INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS, 2008, : 473 - 478
  • [37] Voice Disorders Identification Using Multilayer Neural Network
    Salhi, Lotfi
    Mourad, Talbi
    Cherif, Adnene
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2010, 7 (02) : 177 - 185
  • [38] Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting
    Gresse, Adrien
    Quillot, Mathias
    Dufour, Richard
    Bonastre, Jean-Francois
    INTERSPEECH 2020, 2020, : 160 - 164
  • [39] Voice spoofing countermeasure for voice replay attacks using deep learning
    Jincheng Zhou
    Tao Hai
    Dayang N. A. Jawawi
    Dan Wang
    Ebuka Ibeke
    Cresantus Biamba
    Journal of Cloud Computing, 11
  • [40] Voice Recognition and Voice Comparison using Machine Learning Techniques: A Survey
    Tandel, Nishtha H.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 459 - 465