Speech enhancement method based on multi-domain fusion and neural architecture search

被引:0
|
作者
Zhang R. [1 ]
Zhang P. [1 ]
Sun C. [1 ]
机构
[1] College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan
来源
Tongxin Xuebao/Journal on Communications | 2024年 / 45卷 / 02期
基金
中国国家自然科学基金;
关键词
complex neural architecture search; complex spatial domain mapping; low-cost evaluation; multi-domain fusion; speech enhancement model;
D O I
10.11959/j.issn.1000-436x.2024018
中图分类号
学科分类号
摘要
In order to further improve the self-learning and noise reduction ability of speech enhancement model, a speech enhancement method based on multi-domain fusion and neural architecture search was proposed. The multi-spatial domain mapping and fusion mechanism of speech signals were designed to realize the mining of real complex number correlation. Based on the characteristics of convolution pooling of the model, a complex neural architecture search mechanism was proposed, and the speech enhancement model was constructed efficiently and automatically through the designed search space, search strategy and evaluation strategy. In the comparison and generalization experiment between the optimal speech enhancement model and the baseline model, the two indexes of PESQ and STOI increase by 5.6% compared with the optimal baseline model, and the number of model parameters is the lowest. © 2024 Editorial Board of Journal on Communications. All rights reserved.
引用
收藏
页码:225 / 239
页数:14
相关论文
共 26 条
  • [1] XIE Y, ZOU T, SUN W J, Et al., Algorithm of underdetermined convolutive blind source separation for high reverberation environment, Journal on Communications, 44, 2, pp. 82-93, (2023)
  • [2] GHOLAMIANGONABADI D, GROLINGER K., Personalized models for human activity recognition with wearable sensors: deep neural networks and signal processing, Applied Intelligence, 53, 5, pp. 6041-6061, (2023)
  • [3] YIN D C, LUO C, XIONG Z W, Et al., PHASEN: a phase-and-harmonics-aware speech enhancement network, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9458-9465, (2020)
  • [4] TAN K, WANG D L., A convolutional recurrent neural network for real-time speech enhancement, Proceedings of the Interspeech, pp. 3229-3233, (2018)
  • [5] CHOI H S, KIM J H, HUH J, Et al., Phase-aware speech enhancement with deep complex U-net, (2019)
  • [6] RONNEBERGER O, FISCHER P, BROX T., U-Net: convolutional networks for biomedical image segmentation, (2015)
  • [7] HU Y X, LIU Y, LV S B, Et al., DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement, Proceedings of the Interspeech, pp. 2472-2476, (2020)
  • [8] BIAN Y J, SONG Q Q, DU M N, Et al., Subarchitecture ensemble pruning in neural architecture search, IEEE Transactions on Neural Networks and Learning Systems, 33, 12, pp. 7928-7936, (2022)
  • [9] BAKER B, GUPTA O, NAIK N, Et al., Designing neural network architectures using reinforcement learning, (2016)
  • [10] BEECHE C, SINGH J P, LEADER J K, Et al., Super U-Net: a modularized generalizable architecture, Pattern Recognition, 128, (2022)