HILCodec: High-Fidelity and Lightweight Neural Audio Codec

被引：0

作者：

Ahn, Sunghwan ^{[1
,2
]}

Woo, Beom Jun ^{[1
,2
]}

Han, Min Hyun ^{[1
,2
]}

Moon, Chanyeong ^{[1
,2
]}

Kim, Nam Soo ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 08期

关键词：

Codecs; Convolution; Decoding; Vocoders; Psychoacoustic models; Training; Speech coding; Spectrogram; Generative adversarial networks; Distortion; Acoustic signal processing; audio coding; codecs; generative adversarial networks; residual neural networks;

D O I：

10.1109/JSTSP.2024.3469530

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of the SEANet-based codec does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, HILCodec, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.

引用

页码：1517 / 1530

页数：14

共 50 条

[1] HIGH-FIDELITY DIFFUSION-BASED AUDIO CODEC
Zhang, Zhengpu
Feng, Jianyuan
Mao, Yongjian
Zhu, Yehang
Shi, Junjie
Ye, Xuzhou
Liu, Shilei
Liu, Derong
Huang, Chuanzeng
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 344 - 348
[2] A high-fidelity speech and audio codec with low delay and low complexity
Chen, JH
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1161 - 1164
[3] TFF-Codec: A High Fidelity End-to-End Neural Audio Codec
Zhao, Yuhao
Jia, Maoshen
Ru, Jiawei
Wang, Lizhong
Wen, Liang
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
[4] A Low-Power, High-Fidelity Stereo Audio Codec in 0.13 μm CMOS
Jiang, Xicheng
Song, Jungwoo
Chen, Jianlong
Chandrasekar, Vinay
Galal, Sherif
Cheung, Felix Y. L.
Cheung, Darwin
Brooks, Todd L.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2012, 47 (05) : 1221 - 1231
[5] APCodec plus : A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
Du, Hui-Peng
Ai, Yang
Zheng, Rui-Chen
Ling, Zhen-Hua
2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 676 - 680
[6] A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules
Yang, Runxuan
Peng, Yuyang
Hu, Xiaolin
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3362 - 3373
[7] High-Fidelity Audio Compression with Improved RVQGAN
Kumar, Rithesh
Seetharaman, Prem
Luebs, Alejandro
Kumar, Ishaan
Kumar, Kundan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] DIRAC: HIGH-FIDELITY AUDIO FOR YOUR SMARTPHONE
Koziol, Michael
IEEE SPECTRUM, 2018, 55 (04) : 19 - 19
[9] POSTMODERN CONSUMPTION AND THE HIGH-FIDELITY AUDIO MICROCULTURE
Branch, John D.
CONSUMER CULTURE THEORY, 2007, 11 : 79 - 99
[10] PWM inverter IP for high-fidelity audio applications
Lee, Maeum
Kim, Kichul
Kim, Rinchul
TENCON 2005 - 2005 IEEE REGION 10 CONFERENCE, VOLS 1-5, 2006, : 1578 - 1581

← 1 2 3 4 5 →