Mitigating Information Interruptions by COVID-19 Face Masks: A Three-Stage Speech Enhancement Scheme

被引：4

作者：

Dash, Tusar Kanti ^{[1
]}

Chakraborty, Chinmay ^{[2
]}

Mahapatra, Satyajit ^{[3
]}

Panda, Ganapati ^{[1
]}

机构：

[1] CV Raman Global Univ, Dept Elect & Commun Engn, Bhubaneswar 752054, India

[2] Birla Inst Technol, Dept Elect & Commun Engn, Mesra 835215, India

[3] VIT Bhopal Univ, Sch Elect & Elect Engn, Bhopal 466114, India

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 04期

关键词：

Feature extraction; COVID-19; Signal to noise ratio; Faces; Q-factor; Speech enhancement; Face recognition; Coronavirus disease 2019 (COVID-19); face mask; gray wolf optimizer (GWO); information interruptions; speech enhancement (SE); tunable Q-factor wavelet transform (TQWT);

D O I：

10.1109/TCSS.2022.3210988

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The coronavirus disease 2019 (COVID-19) preventive measures have resulted in significant lifestyle changes. One of the COVID-19 new normal is the usage of face masks for protection against airborne aerosol which creates distractions and interruptions in voice communication. It has a different influence on speech than the standard concept of noise affecting speech communication. Furthermore, it has varied effects on speech in different frequency bands. To provide a solution to this problem, a three-stage adaptive speech enhancement (SE) scheme is developed in this article. In the first stage, the tunable $Q$ -factor wavelet transform (TQWT) features are extracted by properly setting the quality factor values and the number of levels from the input speech signal. In the second stage, the adjustable parameters of the preemphasis filter and modified multiband spectral subtraction (MBSS) are determined using bio-inspired techniques for different masking and signal-to-noise ratio (SNR) conditions. In the third stage, the weights, center values, standard deviation of the Gaussian radial basis functions, and input patterns of the radial basis function neural networks (RBFNNs) are updated to predict the optimized parameters from the input TQWT-based cepstral features (TQCFs). In the end, the performance of the proposed algorithm is compared with the standard SE algorithms using two speech datasets.

引用

页码：4790 / 4799

页数：10

共 57 条

[1] Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study [J].

Abd-Alrazaq, Alaa ;

Alhuwail, Dari ;

Househ, Mowafa ;

Hamdi, Mounir ;

Shah, Zubair .

JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (04)

[2]

Acoustics-Preferred Frequencies, 1997, Standard ISO 266:1997

[3]

Amin J., 2021, Cognitive Comput., V14, P1688

[4]

[Anonymous], 2015, PROC IEEE WORKSHOP A

[5]

[Anonymous], US

[6]

Aslam Bakhtawar, 2021, Pers Ubiquitous Comput, P1, DOI [10.1007/s00779-021-01596-3, 10.1007/s00779-021-01596-3]

[7]

Balaji V. R., 2021, Lecture Notes in Networks and Systems), V145, DOI [10.1007/978-981-15- 7345-3_61, DOI 10.1007/978-981-15-7345-3_61]

[8]

Benesty J., 2006, Speech enhancement

[9]

Bhuyan H., 2022, IEEE Trans. Eng. Manag., DOI [10.1109/TEM.2021.3065699, DOI 10.1109/TEM.2021.3065699]

[10]

Blake HL, 2020, Perspectives of the ASHA Special Interest Groups, V5, P1797, DOI [10.1044/2020_persp-20-00133, 10.1044/2020_ PERSP-20-00133, DOI 10.1044/2020PERSP-20-00133]

← 1 2 3 4 5 6 →