HILCodec: High-Fidelity and Lightweight Neural Audio Codec

被引:0
|
作者
Ahn, Sunghwan [1 ,2 ]
Woo, Beom Jun [1 ,2 ]
Han, Min Hyun [1 ,2 ]
Moon, Chanyeong [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
关键词
Codecs; Convolution; Decoding; Vocoders; Psychoacoustic models; Training; Speech coding; Spectrogram; Generative adversarial networks; Distortion; Acoustic signal processing; audio coding; codecs; generative adversarial networks; residual neural networks;
D O I
10.1109/JSTSP.2024.3469530
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of the SEANet-based codec does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, HILCodec, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.
引用
收藏
页码:1517 / 1530
页数:14
相关论文
共 50 条
  • [21] GAN-based Augmentation for Populating Speech Dataset with High Fidelity Synthesized Audio
    Back, Moon-Ki
    Yoon, Seung-Won
    Lee, Kyu-Chul
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1267 - 1269
  • [22] High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder
    Tewari, Ayush
    Zollhofer, Michael
    Bernard, Florian
    Garrido, Pablo
    Kim, Hyeongwoo
    Perez, Patrick
    Theobalt, Christian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 357 - 370
  • [23] Towards a Perceptual Loss: Using a Neural Network Codec Approximation as a Loss for Generative Audio Models
    Ananthabhotla, Ishwarya
    Ewert, Sebastian
    Paradiso, Joseph A.
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1518 - 1525
  • [24] Bridging Global Context Interactions for High-Fidelity Pluralistic Image Completion
    Zheng, Chuanxia
    Song, Guoxian
    Cham, Tat-Jen
    Cai, Jianfei
    Luo, Linjie
    Dinh Phung
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8320 - 8333
  • [25] Spatial-Aware Texture Transformer for High-Fidelity Garment Transfer
    Liu, Ting
    Zhang, Jianfeng
    Nie, Xuecheng
    Wei, Yunchao
    Wei, Shikui
    Zhao, Yao
    Feng, Jiashi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7499 - 7510
  • [26] High-fidelity, simulation-based microsurgical training for neurosurgical residents
    Santyr, Brendan
    Abbass, Mohamad
    Chalil, Alan
    Vivekanandan, Amirti
    Krivosheya, Daria
    Denning, Lynn M.
    Mattingly, Thomas K.
    Haji, Faizal A.
    Lownie, Stephen P.
    NEUROSURGICAL FOCUS, 2022, 53 (02)
  • [27] HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
    Su, Jiaqi
    Jin, Zeyu
    Finkelstein, Adam
    INTERSPEECH 2020, 2020, : 4506 - 4510
  • [28] The use and effectiveness of high-fidelity simulation in health professions education: current update
    Abdulhussain, Yasmin
    Ghelani, Hardik
    Henderson, Helen
    Sudhir, Meghana
    Mascarenhas, Sharon
    Radhakrishnan, Rajan
    Jan, Reem Kais
    SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2022, 98 (12): : 1085 - 1095
  • [29] Do high-fidelity training models translate into better skill acquisition for an endourologist?
    Cloutier, Jonathan
    Traxer, Olivier
    CURRENT OPINION IN UROLOGY, 2015, 25 (02) : 143 - 152
  • [30] A Versatile Ultrasound Simulation System for Education and Training in High-Fidelity Emergency Scenarios
    Farsoni, Saverio
    Astolfi, Luca
    Bonfe, Marcello
    Spadaro, Savino
    Volta, Carlo Alberto
    IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2017, 5