SELF-SUPERVISED AUDIO ENCODER WITH CONTRASTIVE PRETRAINING FOR RESPIRATORY ANOMALY DETECTION

被引：6

作者：

Kulkarni, Shubham ^{[1
]}

Watanabe, Hideaki ^{[1
]}

Homma, Fuminori ^{[1
]}

机构：

[1] Sony Grp Corp, Tokyo, Japan

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW | 2023年

关键词：

Acoustic featurisation; direct waveform audio encoder; self-supervised contrastive audio-encoder (CVAE); respiratory anomaly detection; health monitoring;

D O I：

10.1109/ICASSPW59220.2023.10193030

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Accurate analysis of lung sounds is essential for early disease detection and monitoring. We propose a self-supervised contrastive audio encoder for automated respiratory anomaly detection. The model consists of a direct waveform audio encoder trained in two stages. First, self-supervised pretraining using an acoustic dataset (Audioset) is used to extract high-level representations of the input audio. Second, domain-specific semi-supervised contrastive training is employed on a respiratory database to distinguish cough and breathing sounds. This direct waveform-based encoder outperforms conventional mel-frequency cepstral coefficients (MFCC) and image spectrogram features with CNN-ResNet-based detection models. It is also shown that the pretraining using varied audio sounds significantly improves detection accuracy compared to speech featurization models such as Wav2Vec2.0 and HuBERT. The proposed model achieves the highest accuracy score (91%) and inter-patient (specificity and sensitivity) evaluation score (84.1%) on the largest respiratory anomaly detection dataset. Our work further contributes to remote patient care via accurate continuous monitoring of respiratory abnormalities.

引用

页数：5

共 26 条

[1]

Abid A, 2019, Arxiv, DOI arXiv:1902.04601

[2]

[Anonymous], 2017, GLOB IMP RESP DIS, Vsecond

[3] Classification of lung sounds using convolutional neural networks [J].