A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection

被引:10
作者
Zhao, Denghuang [1 ]
Qiu, Zhixin [1 ]
Jiang, Yujie [1 ]
Zhu, Xincheng [1 ]
Zhang, Xiaojun [1 ]
Tao, Zhi [1 ]
机构
[1] Soochow Univ, 1 Shizi St, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Pathological voice detection; Deep learning; Interpretability; Depthwise separable CNN; CLASSIFICATION; INFORMATION; CEPSTRUM; VOWEL;
D O I
10.1016/j.bspc.2023.105624
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In recent years, deep learning methods in automatic pathological voice detection (APVD) have gained satisfying results. However, most deep learning methods in APVD cannot explain their performance. Interpretability is crucial in deep learning methods applied to the medical field. A lack of interpretability makes it hard for existing methods to give better generalization performance than meaningful feature-based methods in practical appli-cations. This paper proposed an interpretable neural network architecture called the Interpretable Multi-band Feature Extraction Network (IMBFN) based on clear feature extraction logic and a comprehensive result judg-ment method to improve the effectiveness and generalization performance of APVD. An amplitude-trainable SincNet (AT-SincNet) filter bank was put forward in IMBFN and applied as the front-end frequency division network. In addition, IMBFN used a designed two-path one-dimensional depthwise separatable convolutional neural network (CNN)-based feature extractor to extract meaningful voice features. The classification results of each voice frame were used to judge whether the voice was pathological synthetically. Comparative experiments were conducted using data from the MEEI, SVD, and HUPA databases. The best improvement of accuracy, F1-score, and Matthews correlation coefficient (MCC) reached 0.1705, 0.1977, and 0.4463, respectively. Also, blind tests were carried out in participants from the First Affiliated Hospital of Soochow University, and an accuracy, F1-score, and MCC of 0.7594, 0.8491, and 0.2981, respectively, were obtained. Results demonstrated that IMBFN provided meaningful explanations, good APVD effect, and better generalization performance than existing methods.
引用
收藏
页数:17
相关论文
共 67 条
[1]   Voice Pathology Detection and Classification Using Auto-Correlation and Entropy Features in Different Frequency Regions [J].
Al-Nasheri, Ahmed ;
Muhammad, Ghulam ;
Alsulaiman, Mansour ;
Ali, Zulfiqar ;
Malki, Khalid H. ;
Mesallam, Tamer A. ;
Ibrahim, Mohamed Farahat .
IEEE ACCESS, 2018, 6 :6961-6974
[2]   An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification [J].
Al-nasheri, Ahmed ;
Muhammad, Ghulam ;
Alsulaiman, Mansour ;
Ali, Zulfiqar ;
Mesallam, Tamer A. ;
Farahat, Mohamed ;
Malki, Khalid H. ;
Bencherif, Mohamed A. .
JOURNAL OF VOICE, 2017, 31 (01) :113.e9-113.e18
[3]   Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework [J].
Alhussein, Musaed ;
Muhammad, Ghulam .
IEEE ACCESS, 2018, 6 :41034-41041
[4]   An intelligent healthcare system for detection and classification to discriminate vocal fold disorders [J].
Ali, Zulfiqar ;
Hossain, M. Shamim ;
Muhammad, Ghulam ;
Sangaiah, Arun Kumar .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 85 :19-28
[5]   MTEX-CNN: Multivariate Time series EXplanations for Predictions with Convolutional Neural Networks [J].
Assaf, Roy ;
Giurgiu, Ioana ;
Bagehorn, Frank ;
Schumann, Anika .
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, :958-963
[6]   Interpretable Machine Learning Techniques in ECG-Based Heart Disease Classification: A Systematic Review [J].
Ayano, Yehualashet Megersa ;
Schwenker, Friedhelm ;
Dufera, Bisrat Derebssa ;
Debelee, Taye Girma .
DIAGNOSTICS, 2023, 13 (01)
[7]   A Highly Accurate Dysphonia Detection System Using Linear Discriminant Analysis [J].
Basalamah, Anas ;
Hasan, Mahedi ;
Bhowmik, Shovan ;
Shahriyar, Shaikh Akib .
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (03) :1921-1938
[8]   Classifying Rhoticity of /(sic)/ in Speech Sound Disorder using Age-and-Sex Normalized Formants [J].
Benway, Nina R. ;
Preston, Jonathan L. ;
Salekin, Asif ;
Xiao, Yi ;
Sharma, Harshit ;
McAllister, Tara .
INTERSPEECH 2023, 2023, :4563-4567
[9]   Challenges of using longitudinal and cross-domain corpora on studies of pathological speech [J].
Botelho, Catarina ;
Schultz, Tanja ;
Abad, Alberto ;
Trancoso, Isabel .
INTERSPEECH 2022, 2022, :1921-1925
[10]  
Chollet F, 2017, Arxiv, DOI [arXiv:1610.02357, DOI 10.48550/ARXIV.1610.02357]