Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion

被引:25
作者
Omeroglu, Asli Nur [1 ]
Mohammed, Hussein M. A. [1 ]
Oral, Emin Argun [1 ]
机构
[1] Ataturk Univ, Dept Elect & Elect Engn, TR-25240 Erzurum, Turkey
来源
ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH | 2022年 / 36卷
关键词
Artificial intelligence; Deep learning; Multi-modal; Saarbruecken Voice Database (SVD); Voice pathology detection and classification; SPEAKER RECOGNITION; MFCC;
D O I
10.1016/j.jestch.2022.101148
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Automatic voice pathology detection systems can effectively help clinicians by enabling objective assessment and diagnosis in early stage of voice pathologies. This paper suggests a novel multi-modal architecture utilizing speech and electroglottography (EGG) signals and investigates their effectiveness in automatic detection of voice pathology. The proposed multi-modal framework combines two parallel Convolutional Neural Networks (CNNs), one for voice signals and the other for EGG signals, to obtain deep features. Classical handcrafted features are also obtained in the same manner. These features are then concatenated to obtain a more prominent feature set. In addition, a feature selection method is applied to remove redundant features. Finally, a SVM classifier is utilized to detect the voice pathology. In order to measure the performance of the proposed pathology detection system, various experiments are conducted on Saarbruecken Voice Database (SVD) without excluding any available pathology or sample. The experimental results show that the proposed voice pathology detection method achieves accuracy up to 90.10% using all speech and EGG samples. Also, sensitivity, specificity and F1-score results of 92.9%, 84.6% and 92.57% are obtained, respectively. The proposed method provides better performance than those given in the literature using all SVD samples through cross-validation testing. Hence, it is promising for automatic detection applications of voice pathology. (c) 2022 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:11
相关论文
共 54 条
  • [1] Akhand M., 2019, 2019 1 INT C ADV SCI, P1
  • [2] Voice Pathology Detection and Classification by Adopting Online Sequential Extreme Learning Machine
    Al-Dhief, Fahad Taha
    Baki, Marina Mat
    Latiff, Nurul Mu'azzah Abdul
    Abd Malik, Nik Noordini Nik
    Salim, Naseer Sabri
    Albader, Musatafa Abbas Abbood
    Mahyuddin, Nor Muzlifah
    Mohammed, Mazin Abed
    [J]. IEEE ACCESS, 2021, 9 : 77293 - 77306
  • [3] Voice Pathology Detection and Classification Using Auto-Correlation and Entropy Features in Different Frequency Regions
    Al-Nasheri, Ahmed
    Muhammad, Ghulam
    Alsulaiman, Mansour
    Ali, Zulfiqar
    Malki, Khalid H.
    Mesallam, Tamer A.
    Ibrahim, Mohamed Farahat
    [J]. IEEE ACCESS, 2018, 6 : 6961 - 6974
  • [4] An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification
    Al-nasheri, Ahmed
    Muhammad, Ghulam
    Alsulaiman, Mansour
    Ali, Zulfiqar
    Mesallam, Tamer A.
    Farahat, Mohamed
    Malki, Khalid H.
    Bencherif, Mohamed A.
    [J]. JOURNAL OF VOICE, 2017, 31 (01) : 113.e9 - 113.e18
  • [5] Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions
    Al-nasheri, Ahmed
    Muhammad, Ghulam
    Alsulaiman, Mansour
    Ali, Zulfiqar
    [J]. JOURNAL OF VOICE, 2017, 31 (01) : 3 - 15
  • [6] Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework
    Alhussein, Musaed
    Muhammad, Ghulam
    [J]. IEEE ACCESS, 2018, 6 : 41034 - 41041
  • [7] Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?
    Ali, Zulfiqar
    Alsulaiman, Mansour
    Muhammad, Ghulam
    Elamvazuthi, Irraivan
    Al-nasheri, Ahmed
    Mesallam, Tamer A.
    Farahat, Mohamed
    Malki, Khalid H.
    [J]. JOURNAL OF VOICE, 2017, 31 (03) : 386.e1 - 386.e8
  • [8] COMPARATIVE STUDY OF DIFFERENT EPOCH EXTRACTION METHODS FOR SPEECH ASSOCIATED WITH VOICE DISORDERS
    Barche, Purva
    Gurugubelli, Krishna
    Vuppala, Anil Kumar
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6923 - 6927
  • [9] Barry W.J., 2010, SAARBRUCKEN VOICE DA
  • [10] Bhattarai K, 2017, INT CONF INFO SCI, P32, DOI 10.1109/ICIST.2017.7926796