MDCTNet: A Hybrid Approach to Neural Audio Coding

被引：0

作者：

Villemoes, Lars ^{[1
]}

Vinton, Mark ^{[2
]}

Ekstrand, Per ^{[1
]}

Lu, Lie ^{[2
]}

Davidson, Grant ^{[2
]}

Zhou, Cong ^{[2
,3
]}

机构：

[1] Dolby Sweden AB, Adv Technol Grp, S-11330 Stockholm, SE, Sweden

[2] Dolby Labs Inc, Adv Technol Grp, San Francisco, CA 94103 USA

[3] Anuttacon, Santa Clara, CA 95054 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 08期

关键词：

Decoding; Psychoacoustic models; Transforms; Codecs; Bit rate; Audio coding; Entropy; Distortion; Training; Time-frequency analysis; Perceptual audio coding; deep learning; generative models; neural networks;

D O I：

10.1109/JSTSP.2024.3482721

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We describe and evaluate a hybrid neural audio coding system consisting of a perceptual audio encoder and a generative model, MDCTNet. By applying recurrent layers (RNNs) we capture correlations in both time and frequency directions in a perceptually weighted adaptive modified discrete cosine transform (MDCT) domain. By training MDCTNet on a diverse set of full-range monophonic audio signals at 48 kHz sampling, we achieve performance competitive with state-of-the-art audio coding at 24 kb/s variable bitrate (VBR). We also quantify the effect of the generative model-based decoding at lower and higher bitrates and discuss some caveats of the use of data driven signal reconstruction for the audio coding task.

引用

页码：1506 / 1516

页数：11

共 50 条

[21] Optimizations of Neural Audio Coder Toward Perceptual Transparency [J].

Byun, Joon ;

Shin, Seungmin ;

Hwang, Seorim ;

Sung, Jongmo ;

Beack, Seungkwon ;

Park, Youngcheol .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) :1531-1543

[22] G.719: The First ITU-T Standard for High-Quality Conversational Fullband Audio Coding [J].

Taleb, Anisse ;

Karapetkov, Stefan .

IEEE COMMUNICATIONS MAGAZINE, 2009, 47 (10) :124-130

[23] Multistream transmission for hybrid IBOC-AM with embedded/multidescriptive audio coding [J].

Lou, HL ;

Sinha, D ;

Sundberg, CEW .

IEEE TRANSACTIONS ON BROADCASTING, 2002, 48 (03) :179-192

[24] SOURCE CODING OF AUDIO SIGNALS WITH A GENERATIVE MODEL [J].

Fejgin, Roy ;

Klejsa, Janusz ;

Villemoes, Lars ;

Zhou, Cong .

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :341-345

[25] Representations of the Complex-Valued Frequency-Domain LPC for Audio Coding [J].

Jo, Byeongho ;

Beack, Seungkwon .

IEEE SIGNAL PROCESSING LETTERS, 2024, 31 :361-365

[26] A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation [J].

Watcharasupat, Karn N. ;

Wu, Chih-Wei ;

Ding, Yiwei ;

Orife, Iroro ;

Hipple, Aaron J. ;

Williams, Phillip A. ;

Kramer, Scott ;

Lerch, Alexander ;

Wolcott, William .

IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 :73-81

[27] Neural Network Coding [J].

Liu, Litian ;

Solomon, Amit ;

Salamatian, Salman ;

Medard, Muriel .

ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,

[28] Principles and analysis of the squeezing approach to low bit rate spatial audio coding [J].

Cheng, Bin ;

Ritz, Christian ;

Burnett, Ian .

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, :13-16

[29] Neural Speech Coding for Real-Time Communications Using Constant Bitrate Scalar Quantization [J].

Brendel, Andreas ;

Pia, Nicola ;

Gupta, Kishan ;

Behringer, Lyonel ;

Fuchs, Guillaume ;

Multrus, Markus .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) :1462-1476

[30] Audio coding via EMD [J].

Boudraa, Abdel-Ouahab ;

Khaldi, Kais ;

Chonavel, Thierry ;

Hadj-Alouane, Mounia Turki ;

Komaty, Ali .

DIGITAL SIGNAL PROCESSING, 2020, 104

← 1 2 3 4 5 →