MMHFNet: Multi-modal and multi-layer hybrid fusion network for voice pathology detection

被引:14
作者
Mohammed, Hussein M. A. [1 ]
Omeroglu, Asli Nur [1 ]
Oral, Emin Argun [1 ,2 ]
机构
[1] Ataturk Univ, Dept Elect & Elect Engn, TR-25240 Yakutiye, Erzurum, Turkiye
[2] Ataturk Univ, High Performance Comp Applicat & Res Ctr, TR-25240 Yakutiye, Erzurum, Turkiye
关键词
Voice pathology detection; Multi-modal data fusion; Multi-layer fusion; Deep learning; CNN; LSTM; NEURAL-NETWORKS; HEALTH-CARE; INFORMATION; CLASSIFICATION; IDENTIFICATION;
D O I
10.1016/j.eswa.2023.119790
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic voice pathology detection using non-invasive techniques that utilize patients' speech and electroglot-tograph (EGG) signals play a vital role in diagnosis and early medical intervention. In this paper, a novel deep Multi-Modal and Multi-Layer Hybrid Fusion Network (MMHFNet) is proposed to improve the performance of non-invasive voice pathology detection systems. MMHFNet simultaneously incorporates complementary information of different modalities (speech and EGG signals). It also vertically combines the low-level features, extracted from shallow layers, and high-level features, extracted from deep layers, to take the full advantage of spatio-spectral information of different layers for multi-layer fusion. The features extracted by MMHFNet are then fed into an LSTM classification network to diagnose the voice pathology. Comprehensive experiments are conducted on the publicly available Saarbruecken Voice Database (SVD) to evaluate the performance of the proposed MMHFNet. This dataset is used in two manners; one using its all samples and the other with selected samples to form the largest balanced SVD dataset. Experimental results demonstrated that the proposed MMHFNet achieves accuracy rates of 91% and 96.05% for datasets with all and balanced samples, respectively.
引用
收藏
页数:13
相关论文
共 66 条
  • [41] Edge Computing with Cloud for Voice Disorder Assessment and Treatment
    Muhammad, Ghulam
    Alhamid, Mohammed F.
    Alsulaiman, Mansour
    Gupta, Brij
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2018, 56 (04) : 60 - 65
  • [42] Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix
    Muhammad, Ghulam
    Alhamid, Mohammed F.
    Hossain, M. Shamim
    Almogren, Ahmad S.
    Vasilakos, Athanasios V.
    [J]. SENSORS, 2017, 17 (02)
  • [43] Naikare K., 2018, 2018 INT C COMMUNICA, P1
  • [44] Glottal Source Information for Pathological Voice Detection
    Narendra, N. P.
    Alku, Paavo
    [J]. IEEE ACCESS, 2020, 8 : 67745 - 67755
  • [45] Emotion recognition based on convolutional neural networks and heterogeneous bio-signal data sources
    Ngai, Wang Kay
    Xie, Haoran
    Zou, Di
    Chou, Kee-Lee
    [J]. INFORMATION FUSION, 2022, 77 : 107 - 117
  • [46] Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion
    Omeroglu, Asli Nur
    Mohammed, Hussein M. A.
    Oral, Emin Argun
    [J]. ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2022, 36
  • [47] SaccadeFork: A lightweight multi-sensor fusion-based target detector
    Ouyang, Zhenchao
    Cui, Jiahe
    Dong, Xiaoyun
    Li, Yanqi
    Niu, Jianwei
    [J]. INFORMATION FUSION, 2022, 77 : 172 - 183
  • [48] Rueda A, 2019, INT CONF ACOUST SPEE, P6415, DOI 10.1109/ICASSP.2019.8682391
  • [49] Sachdeva K., 2019, J VOIC, V8
  • [50] Unsupervised Deep Change Vector Analysis for Multiple-Change Detection in VHR Images
    Saha, Sudipan
    Bovolo, Francesca
    Bruzzone, Lorenzo
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (06): : 3677 - 3693