Neural Kalman Filters for Acoustic Echo Cancellation: Comparison of deep neural network-based extensions [Special Issue On Model-Based and Data-Driven Audio Signal Processing]

被引:1
作者
Seidel, Ernst [1 ]
Enzner, Gerald [2 ]
Mowlaee, Pejman [3 ]
Fingscheidt, Tim [4 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, D-38106 Braunschweig, Germany
[2] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, Div Speech Technol & Hearing Aids, D-26111 Oldenburg, Germany
[3] GN Adv Sci, DK-2750 Ballerup, Denmark
[4] Tech Univ Carolo Wilhelmina Braunschweig, D-38106 Braunschweig, Germany
关键词
Training; Measurement; Echo cancellers; Filtering; Special issues and sections; Noise; Training data; Acoustics; Kalman filters; Speech processing;
D O I
10.1109/MSP.2024.3449557
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Kalman filtering is a powerful approach to adaptive filtering for various problems in signal processing. The frequency-domain adaptive Kalman filter (FDKF), based on the concept of the acoustic state space, provides a unifying solution to the adaptive filter update and the related stepsize control. It was conceived for the problem of acoustic echo cancellation and, as such, is frequently applied in hands-free systems. This article motivates and briefly recapitulates the linear FDKF and investigates how it can be further supported by deep neural networks (DNNs) in various ways, specifically to overcome the challenges and limitations related to the usually required estimation of process and observation noise covariances for the Kalman filter. While the mere FDKF comes with very low computational complexity, its neural Kalman filter variants may deliver faster (re)convergence, better echo cancellation, and even exceed the FDKF in its excellent double-talk near-end speech preservation both under linear and nonlinear loudspeaker conditions. To provide a synopsis of the state of the art, this article contributes a comparison of a range of DNN-based extensions of FDKF in the same training framework and using the same data. © 1991-2012 IEEE.
引用
收藏
页码:24 / 38
页数:15
相关论文
共 54 条
  • [31] Kuech F., Mabande E., Enzner G., State-space architecture of the partitioned- block-based acoustic echo controller, Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 1295-1299, (2014)
  • [32] Haubner T., Brendel A., Kellermann W., End-to-end deep learning-based adaptation control for linear acoustic echo cancellation, IEEE/ACM Trans. Audio, Speech, Language Process., 32, pp. 227-238, (2024)
  • [33] Yang D., Jiang F., Wu W., Fang X., Cao M., Low-complexity acoustic echo cancellation with neural Kalman filtering, Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 7846-7850, (2023)
  • [34] Zhang Y., Yu M., Zhang H., Yu D., Wang D., NeuralKalman: A learnable Kalman filter for acoustic echo cancellation, (2023)
  • [35] Zhang H., Kandadai S., Rao H., Kim M., Pruthi T., Kristjansson T., Deep adaptive AEC: Hybrid of deep learning and adaptive acoustic echo cancellation, Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 756-760, (2022)
  • [36] Mack W., Habets E.A.P., Deep filtering: Signal extraction and reconstruction using complex time-frequency filters, IEEE Signal Process. Lett., 27, pp. 61-65, (2020)
  • [37] Purin M., Sootla S., Sponza M., Saabas A., Cutler R., AECMOS: A speech quality assessment metric for echo impairment, Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 901-905, (2022)
  • [38] Corrigendum 1: Wideband Extension to Rec. P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs, (2017)
  • [39] Taal C.H., Hendriks R.C., Heusdens R., Jensen J., An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE/ACM Trans. Audio, Speech, Language Process., 19, 7, pp. 2125-2136, (2011)
  • [40] Paszke A., Et al., PyTorch: An imperative style, high-performance deep learning library, Proc. Adv. Neural Inf. Process. Syst. 32 (NeurIPS), pp. 8024-8035, (2019)