Detection of breath sounds in speech: A deep learning approach

被引：0

作者：

Arafath, K. Mohamed Ismail Yasar ^{[1
]}

Routray, Aurobinda ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Kharagpur 721302, India

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2025年 / 141卷

关键词：

Breath sound detection; Breath annotation; Deep learning; Mel-spectrogram; Self-supervised learning; SIGNAL;

D O I：

10.1016/j.engappai.2024.109808

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Breath sound detection from speech recordings has wide-ranging applications, from high-quality audio recordings to medical diagnostics. However, perceptual recognition of breath sounds for annotation is prone to errors, and breath sounds typically occupy only 5% of speech recordings, leading to significant class imbalance. Additionally, the limited availability of annotated data makes the application of deep learning (DL) methods challenging. This paper proposes the use of thermal and normal videos, alongside speech data, to mitigate annotation errors in breath sound detection. To address class imbalance, we leverage self- supervised learning (SSL), employing a jigsaw puzzle solver as a pretext task to augment training data and enhance model performance. The jigsaw puzzle solver helps address the class imbalance by creating a balanced task for pretraining, improving the performance of the downstream task. This work also uses convolutional neural networks (CNN) and bidirectional long short-term memory (BiLSTM) models to locate breath sounds in speech recordings accurately. The proposed SSL implementation achieves an F1-score of 96% in a speaker- independent configuration. The proposed algorithm has also been tested on publicly available audio recordings from YouTube1and the BiLSTM version is available for testing on Hugging Face.2

引用

页数：12

共 50 条

[1] An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to Speaker Recognition
Dumpala, Sri Harsha
Alluri, K. N. R. K. Raju
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 98 - 108
[2] Deep Learning Based Fusion Approach for Hate Speech Detection
Zhou, Yanling
Yang, Yanyan
Liu, Han
Liu, Xiufeng
Savage, Nick
IEEE ACCESS, 2020, 8 : 128923 - 128929
[3] THE EFFECTS OF BREATH SOUNDS ON THE PERCEPTION OF SYNTHETIC SPEECH
WHALEN, DH
HOEQUIST, CE
SHEFFERT, SM
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 97 (05): : 3147 - 3153
[4] Exhaled breath signal analysis for diabetes detection: an optimized deep learning approach
Gade, Anita
Vijaya Baskar, V.
Panneerselvam, John
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2024, 27 (04) : 443 - 458
[5] Deep feature fusion for hate speech detection: a transfer learning approach
Vishwajeet Dwivedy
Pradeep Kumar Roy
Multimedia Tools and Applications, 2023, 82 : 36279 - 36301
[6] Deep feature fusion for hate speech detection: a transfer learning approach
Dwivedy, Vishwajeet
Roy, Pradeep Kumar
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (23) : 36279 - 36301
[7] A Deep Learning Approach for Automatic Hate Speech Detection in the Saudi Twittersphere
Alshalan, Raghad
Al-Khalifa, Hend
APPLIED SCIENCES-BASEL, 2020, 10 (23): : 1 - 16
[8] An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals
Ruinskiy, Dima
Lavner, Yizhar
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 838 - 850
[9] Automatic snoring sounds detection from sleep sounds based on deep learning
Yanmei Jiang
Jianxin Peng
Xiaowen Zhang
Physical and Engineering Sciences in Medicine, 2020, 43 : 679 - 689
[10] COVID-19 detection in cough, breath and speech using deep transfer learning and bottleneck features
Pahar, Madhurananda
Klopper, Marisa
Warren, Robin
Niesler, Thomas
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141

← 1 2 3 4 5 →