共 50 条
MDCTNet: A Hybrid Approach to Neural Audio Coding
被引:0
作者:
Villemoes, Lars
[1
]
Vinton, Mark
[2
]
Ekstrand, Per
[1
]
Lu, Lie
[2
]
Davidson, Grant
[2
]
Zhou, Cong
[2
,3
]
机构:
[1] Dolby Sweden AB, Adv Technol Grp, S-11330 Stockholm, SE, Sweden
[2] Dolby Labs Inc, Adv Technol Grp, San Francisco, CA 94103 USA
[3] Anuttacon, Santa Clara, CA 95054 USA
关键词:
Decoding;
Psychoacoustic models;
Transforms;
Codecs;
Bit rate;
Audio coding;
Entropy;
Distortion;
Training;
Time-frequency analysis;
Perceptual audio coding;
deep learning;
generative models;
neural networks;
D O I:
10.1109/JSTSP.2024.3482721
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
We describe and evaluate a hybrid neural audio coding system consisting of a perceptual audio encoder and a generative model, MDCTNet. By applying recurrent layers (RNNs) we capture correlations in both time and frequency directions in a perceptually weighted adaptive modified discrete cosine transform (MDCT) domain. By training MDCTNet on a diverse set of full-range monophonic audio signals at 48 kHz sampling, we achieve performance competitive with state-of-the-art audio coding at 24 kb/s variable bitrate (VBR). We also quantify the effect of the generative model-based decoding at lower and higher bitrates and discuss some caveats of the use of data driven signal reconstruction for the audio coding task.
引用
收藏
页码:1506 / 1516
页数:11
相关论文