SELF-SUPERVISED AUDIO ENCODER WITH CONTRASTIVE PRETRAINING FOR RESPIRATORY ANOMALY DETECTION

被引:6
作者
Kulkarni, Shubham [1 ]
Watanabe, Hideaki [1 ]
Homma, Fuminori [1 ]
机构
[1] Sony Grp Corp, Tokyo, Japan
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW | 2023年
关键词
Acoustic featurisation; direct waveform audio encoder; self-supervised contrastive audio-encoder (CVAE); respiratory anomaly detection; health monitoring;
D O I
10.1109/ICASSPW59220.2023.10193030
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurate analysis of lung sounds is essential for early disease detection and monitoring. We propose a self-supervised contrastive audio encoder for automated respiratory anomaly detection. The model consists of a direct waveform audio encoder trained in two stages. First, self-supervised pretraining using an acoustic dataset (Audioset) is used to extract high-level representations of the input audio. Second, domain-specific semi-supervised contrastive training is employed on a respiratory database to distinguish cough and breathing sounds. This direct waveform-based encoder outperforms conventional mel-frequency cepstral coefficients (MFCC) and image spectrogram features with CNN-ResNet-based detection models. It is also shown that the pretraining using varied audio sounds significantly improves detection accuracy compared to speech featurization models such as Wav2Vec2.0 and HuBERT. The proposed model achieves the highest accuracy score (91%) and inter-patient (specificity and sensitivity) evaluation score (84.1%) on the largest respiratory anomaly detection dataset. Our work further contributes to remote patient care via accurate continuous monitoring of respiratory abnormalities.
引用
收藏
页数:5
相关论文
共 26 条
[1]  
Abid A, 2019, Arxiv, DOI arXiv:1902.04601
[2]  
[Anonymous], 2017, GLOB IMP RESP DIS, Vsecond
[3]   Classification of lung sounds using convolutional neural networks [J].
Aykanat, Murat ;
Kilic, Ozkan ;
Kurt, Bahar ;
Saryal, Sevgi .
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
[4]  
Baevski A, 2020, Arxiv, DOI arXiv:1910.05453
[5]   Auscultation in the diagnosis of respiratory disease in the 21st century [J].
Ceresa, Claudia C. ;
Johnston, Ian D. A. .
POSTGRADUATE MEDICAL JOURNAL, 2008, 84 (994) :393-394
[6]  
Chamberlain D, 2016, IEEE ENG MED BIO, P804, DOI 10.1109/EMBC.2016.7590823
[7]   Triple-Classification of Respiratory Sounds Using Optimized S-Transform and Deep Residual Networks [J].
Chen, Hai ;
Yuan, Xiaochen ;
Pei, Zhiyuan ;
Li, Mianjie ;
Li, Jianqing .
IEEE ACCESS, 2019, 7 :32845-32852
[8]   Convolutional neural networks based efficient approach for classification of lung diseases [J].
Demir, Fatih ;
Sengur, Abdulkadir ;
Bajaj, Varun .
HEALTH INFORMATION SCIENCE AND SYSTEMS, 2019, 8 (01)
[9]   Recognition of pulmonary diseases from lung sounds using convolutional neural networks and long short-term memory [J].
Fraiwan, M. ;
Fraiwan, L. ;
Alkhodari, M. ;
Hassanin, O. .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (10) :4759-4771
[10]  
Gairola S., 2021, ARXIV