MDCTNet: A Hybrid Approach to Neural Audio Coding

被引:0
作者
Villemoes, Lars [1 ]
Vinton, Mark [2 ]
Ekstrand, Per [1 ]
Lu, Lie [2 ]
Davidson, Grant [2 ]
Zhou, Cong [2 ,3 ]
机构
[1] Dolby Sweden AB, Adv Technol Grp, S-11330 Stockholm, SE, Sweden
[2] Dolby Labs Inc, Adv Technol Grp, San Francisco, CA 94103 USA
[3] Anuttacon, Santa Clara, CA 95054 USA
关键词
Decoding; Psychoacoustic models; Transforms; Codecs; Bit rate; Audio coding; Entropy; Distortion; Training; Time-frequency analysis; Perceptual audio coding; deep learning; generative models; neural networks;
D O I
10.1109/JSTSP.2024.3482721
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We describe and evaluate a hybrid neural audio coding system consisting of a perceptual audio encoder and a generative model, MDCTNet. By applying recurrent layers (RNNs) we capture correlations in both time and frequency directions in a perceptually weighted adaptive modified discrete cosine transform (MDCT) domain. By training MDCTNet on a diverse set of full-range monophonic audio signals at 48 kHz sampling, we achieve performance competitive with state-of-the-art audio coding at 24 kb/s variable bitrate (VBR). We also quantify the effect of the generative model-based decoding at lower and higher bitrates and discuss some caveats of the use of data driven signal reconstruction for the audio coding task.
引用
收藏
页码:1506 / 1516
页数:11
相关论文
共 50 条
[21]   Optimizations of Neural Audio Coder Toward Perceptual Transparency [J].
Byun, Joon ;
Shin, Seungmin ;
Hwang, Seorim ;
Sung, Jongmo ;
Beack, Seungkwon ;
Park, Youngcheol .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) :1531-1543
[22]   G.719: The First ITU-T Standard for High-Quality Conversational Fullband Audio Coding [J].
Taleb, Anisse ;
Karapetkov, Stefan .
IEEE COMMUNICATIONS MAGAZINE, 2009, 47 (10) :124-130
[23]   Multistream transmission for hybrid IBOC-AM with embedded/multidescriptive audio coding [J].
Lou, HL ;
Sinha, D ;
Sundberg, CEW .
IEEE TRANSACTIONS ON BROADCASTING, 2002, 48 (03) :179-192
[24]   SOURCE CODING OF AUDIO SIGNALS WITH A GENERATIVE MODEL [J].
Fejgin, Roy ;
Klejsa, Janusz ;
Villemoes, Lars ;
Zhou, Cong .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :341-345
[25]   Representations of the Complex-Valued Frequency-Domain LPC for Audio Coding [J].
Jo, Byeongho ;
Beack, Seungkwon .
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 :361-365
[26]   A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation [J].
Watcharasupat, Karn N. ;
Wu, Chih-Wei ;
Ding, Yiwei ;
Orife, Iroro ;
Hipple, Aaron J. ;
Williams, Phillip A. ;
Kramer, Scott ;
Lerch, Alexander ;
Wolcott, William .
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 :73-81
[27]   Neural Network Coding [J].
Liu, Litian ;
Solomon, Amit ;
Salamatian, Salman ;
Medard, Muriel .
ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
[28]   Principles and analysis of the squeezing approach to low bit rate spatial audio coding [J].
Cheng, Bin ;
Ritz, Christian ;
Burnett, Ian .
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, :13-16
[29]   Neural Speech Coding for Real-Time Communications Using Constant Bitrate Scalar Quantization [J].
Brendel, Andreas ;
Pia, Nicola ;
Gupta, Kishan ;
Behringer, Lyonel ;
Fuchs, Guillaume ;
Multrus, Markus .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) :1462-1476
[30]   Audio coding via EMD [J].
Boudraa, Abdel-Ouahab ;
Khaldi, Kais ;
Chonavel, Thierry ;
Hadj-Alouane, Mounia Turki ;
Komaty, Ali .
DIGITAL SIGNAL PROCESSING, 2020, 104