MDCTNet: A Hybrid Approach to Neural Audio Coding

被引:0
作者
Villemoes, Lars [1 ]
Vinton, Mark [2 ]
Ekstrand, Per [1 ]
Lu, Lie [2 ]
Davidson, Grant [2 ]
Zhou, Cong [2 ,3 ]
机构
[1] Dolby Sweden AB, Adv Technol Grp, S-11330 Stockholm, SE, Sweden
[2] Dolby Labs Inc, Adv Technol Grp, San Francisco, CA 94103 USA
[3] Anuttacon, Santa Clara, CA 95054 USA
关键词
Decoding; Psychoacoustic models; Transforms; Codecs; Bit rate; Audio coding; Entropy; Distortion; Training; Time-frequency analysis; Perceptual audio coding; deep learning; generative models; neural networks;
D O I
10.1109/JSTSP.2024.3482721
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We describe and evaluate a hybrid neural audio coding system consisting of a perceptual audio encoder and a generative model, MDCTNet. By applying recurrent layers (RNNs) we capture correlations in both time and frequency directions in a perceptually weighted adaptive modified discrete cosine transform (MDCT) domain. By training MDCTNet on a diverse set of full-range monophonic audio signals at 48 kHz sampling, we achieve performance competitive with state-of-the-art audio coding at 24 kb/s variable bitrate (VBR). We also quantify the effect of the generative model-based decoding at lower and higher bitrates and discuss some caveats of the use of data driven signal reconstruction for the audio coding task.
引用
收藏
页码:1506 / 1516
页数:11
相关论文
共 50 条
  • [1] Neural Speech and Audio Coding: Modern AI technology meets traditional codecs
    Kim, Minje
    Skoglund, Jan
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 85 - 93
  • [2] HILCodec: High-Fidelity and Lightweight Neural Audio Codec
    Ahn, Sunghwan
    Woo, Beom Jun
    Han, Min Hyun
    Moon, Chanyeong
    Kim, Nam Soo
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) : 1517 - 1530
  • [3] SoundStream: An End-to-End Neural Audio Codec
    Zeghidour, Neil
    Luebs, Alejandro
    Omran, Ahmed
    Skoglund, Jan
    Tagliasacchi, Marco
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 495 - 507
  • [4] Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
    Zhen, Kai
    Lee, Mi Suk
    Sung, Jongmo
    Beack, Seungkwon
    Kim, Minje
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 2159 - 2163
  • [5] The MPEG Surround Audio Coding Standard
    Hilpert, Johannes
    Disch, Sascha
    IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (01) : 148 - 152
  • [6] Scalable and Efficient Neural Speech Coding: A Hybrid Design
    Zhen, Kai
    Sung, Jongmo
    Lee, Mi Suk
    Beack, Seungkwon
    Kim, Minje
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 12 - 25
  • [7] Highly Efficient Audio Coding With Blind Spectral Recovery Based on Machine Learning
    Kim, Jae-Won
    Beack, Seung Kwon
    Lim, Wootaek
    Park, Hochong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1212 - 1216
  • [8] Training Supervised Neural Networks for PolSAR Despeckling With an Hybrid Approach
    Lu, Xialei
    Vitale, Sergio
    Aghababei, Hossein
    Ferraioli, Giampaolo
    Pascazio, Vito
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [9] Parametric coding of stereo audio
    Breebaart, J
    van de Par, S
    Kohlrausch, A
    Schuijers, E
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (09) : 1305 - 1322
  • [10] Parametric Coding of Stereo Audio
    Jeroen Breebaart
    Steven van de Par
    Armin Kohlrausch
    Erik Schuijers
    EURASIP Journal on Advances in Signal Processing, 2005