Speech Enhancement Using Dynamical Variational AutoEncoder

被引:0
作者
Do, Hao D. [1 ]
机构
[1] FPT Univ, Ho Chi Minh City, Vietnam
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II | 2023年 / 13996卷
关键词
speech enhancement; dynamical variational autoEncoder; generative model;
D O I
10.1007/978-981-99-5837-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research focuses on dealing with speech enhancement via a generative model. Many other solutions, which are trained with some fixed kinds of interference or noises, need help when extracting speech from the mixture with a strange noise. We use a class of generative models called Dynamical Variational AutoEncoder (DVAE), which combines generative and temporal models to analyze the speech signal. This class of models makes attention to speech signal behavior, then extracts and enhances the speech. Moreover, we design a new architecture in the DVAE class named Bi-RVAE, which is more straightforward than the other models but gains good results. Experimental results show that DVAE class, including our proposed design, achieves a high-quality recovered speech. This class could enhance the speech signal before passing it into the central processing models.
引用
收藏
页码:247 / 258
页数:12
相关论文
共 50 条
[41]   Speech Enhancement Using LinkNet Architecture [J].
Patel, Anuj ;
Prasad, G. Satya ;
Chandra, Sabyasachi ;
Bharati, Puja ;
Das Mandal, Shyamal Kumar .
SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 :245-257
[42]   Speech Enhancement Using Heterogeneous Information [J].
Xiong, Yan ;
Xu, Fang ;
Chen, Qiang ;
Zhang, Jun .
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) :46-59
[43]   Speech Enhancement Using Compressed Sensing [J].
Abrol, Vinayak ;
Sharma, Pulkit ;
Sao, Anil Kumar .
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :3273-3277
[44]   Speech Enhancement Using Bayesian Wavenet [J].
Qian, Kaizhi ;
Zhang, Yang ;
Chang, Shiyu ;
Yang, Xuesong ;
Florencio, Dinei ;
Hasegawa-Johnson, Mark .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2013-2017
[45]   DISENTANGLEMENT LEARNING FOR VARIATIONAL AUTOENCODERS APPLIED TO AUDIO-VISUAL SPEECH ENHANCEMENT [J].
Carbajal, Guillaume ;
Richter, Julius ;
Gerkmann, Timo .
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, :126-130
[46]   SPEECH ENHANCEMENT USING ARCH MODEL [J].
Atkins, Aviva ;
Cohen, Israel .
2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
[47]   Gaussian Process Dynamical Autoencoder Model [J].
Takano, Jo ;
Omori, Toshiaki .
2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, :45-49
[48]   Automated defect detection in nanomaterial-coated-fabrics using variational autoencoder [J].
Tram, Nguyen Ngoc ;
Jooyong, Kim .
JOURNAL OF ENGINEERED FIBERS AND FABRICS, 2024, 19
[49]   Using Deep Speech Recognition to Evaluate Speech Enhancement Methods [J].
Siddiqui, Shamoon ;
Rasool, Ghulam ;
Ramachandran, Ravi P. ;
Bouaynaya, Nidhal C. .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[50]   NON-NEGATIVE SOURCE-FILTER DYNAMICAL SYSTEM FOR SPEECH ENHANCEMENT [J].
Simsekli, Umut ;
Le Roux, Jonathan ;
Hershey, John R. .
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,