A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection

被引：10

作者：

Zhao, Denghuang ^{[1
]}

Qiu, Zhixin ^{[1
]}

Jiang, Yujie ^{[1
]}

Zhu, Xincheng ^{[1
]}

Zhang, Xiaojun ^{[1
]}

Tao, Zhi ^{[1
]}

机构：

[1] Soochow Univ, 1 Shizi St, Suzhou, Peoples R China

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2024年 / 88卷

基金：

中国国家自然科学基金;

关键词：

Pathological voice detection; Deep learning; Interpretability; Depthwise separable CNN; CLASSIFICATION; INFORMATION; CEPSTRUM; VOWEL;

D O I：

10.1016/j.bspc.2023.105624

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

In recent years, deep learning methods in automatic pathological voice detection (APVD) have gained satisfying results. However, most deep learning methods in APVD cannot explain their performance. Interpretability is crucial in deep learning methods applied to the medical field. A lack of interpretability makes it hard for existing methods to give better generalization performance than meaningful feature-based methods in practical appli-cations. This paper proposed an interpretable neural network architecture called the Interpretable Multi-band Feature Extraction Network (IMBFN) based on clear feature extraction logic and a comprehensive result judg-ment method to improve the effectiveness and generalization performance of APVD. An amplitude-trainable SincNet (AT-SincNet) filter bank was put forward in IMBFN and applied as the front-end frequency division network. In addition, IMBFN used a designed two-path one-dimensional depthwise separatable convolutional neural network (CNN)-based feature extractor to extract meaningful voice features. The classification results of each voice frame were used to judge whether the voice was pathological synthetically. Comparative experiments were conducted using data from the MEEI, SVD, and HUPA databases. The best improvement of accuracy, F1-score, and Matthews correlation coefficient (MCC) reached 0.1705, 0.1977, and 0.4463, respectively. Also, blind tests were carried out in participants from the First Affiliated Hospital of Soochow University, and an accuracy, F1-score, and MCC of 0.7594, 0.8491, and 0.2981, respectively, were obtained. Results demonstrated that IMBFN provided meaningful explanations, good APVD effect, and better generalization performance than existing methods.

引用

页数：17

共 67 条

[1] Voice Pathology Detection and Classification Using Auto-Correlation and Entropy Features in Different Frequency Regions [J].