HILCodec: High-Fidelity and Lightweight Neural Audio Codec

被引:0
|
作者
Ahn, Sunghwan [1 ,2 ]
Woo, Beom Jun [1 ,2 ]
Han, Min Hyun [1 ,2 ]
Moon, Chanyeong [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
关键词
Codecs; Convolution; Decoding; Vocoders; Psychoacoustic models; Training; Speech coding; Spectrogram; Generative adversarial networks; Distortion; Acoustic signal processing; audio coding; codecs; generative adversarial networks; residual neural networks;
D O I
10.1109/JSTSP.2024.3469530
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of the SEANet-based codec does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, HILCodec, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.
引用
收藏
页码:1517 / 1530
页数:14
相关论文
共 50 条
  • [1] HIGH-FIDELITY DIFFUSION-BASED AUDIO CODEC
    Zhang, Zhengpu
    Feng, Jianyuan
    Mao, Yongjian
    Zhu, Yehang
    Shi, Junjie
    Ye, Xuzhou
    Liu, Shilei
    Liu, Derong
    Huang, Chuanzeng
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 344 - 348
  • [2] A high-fidelity speech and audio codec with low delay and low complexity
    Chen, JH
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1161 - 1164
  • [3] TFF-Codec: A High Fidelity End-to-End Neural Audio Codec
    Zhao, Yuhao
    Jia, Maoshen
    Ru, Jiawei
    Wang, Lizhong
    Wen, Liang
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [4] A Low-Power, High-Fidelity Stereo Audio Codec in 0.13 μm CMOS
    Jiang, Xicheng
    Song, Jungwoo
    Chen, Jianlong
    Chandrasekar, Vinay
    Galal, Sherif
    Cheung, Felix Y. L.
    Cheung, Darwin
    Brooks, Todd L.
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2012, 47 (05) : 1221 - 1231
  • [5] APCodec plus : A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
    Du, Hui-Peng
    Ai, Yang
    Zheng, Rui-Chen
    Ling, Zhen-Hua
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 676 - 680
  • [6] A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules
    Yang, Runxuan
    Peng, Yuyang
    Hu, Xiaolin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3362 - 3373
  • [7] High-Fidelity Audio Compression with Improved RVQGAN
    Kumar, Rithesh
    Seetharaman, Prem
    Luebs, Alejandro
    Kumar, Ishaan
    Kumar, Kundan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] DIRAC: HIGH-FIDELITY AUDIO FOR YOUR SMARTPHONE
    Koziol, Michael
    IEEE SPECTRUM, 2018, 55 (04) : 19 - 19
  • [9] POSTMODERN CONSUMPTION AND THE HIGH-FIDELITY AUDIO MICROCULTURE
    Branch, John D.
    CONSUMER CULTURE THEORY, 2007, 11 : 79 - 99
  • [10] PWM inverter IP for high-fidelity audio applications
    Lee, Maeum
    Kim, Kichul
    Kim, Rinchul
    TENCON 2005 - 2005 IEEE REGION 10 CONFERENCE, VOLS 1-5, 2006, : 1578 - 1581