HILCodec: High-Fidelity and Lightweight Neural Audio Codec

被引：0

作者：

Ahn, Sunghwan ^{[1
,2
]}

Woo, Beom Jun ^{[1
,2
]}

Han, Min Hyun ^{[1
,2
]}

Moon, Chanyeong ^{[1
,2
]}

Kim, Nam Soo ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 08期

关键词：

Codecs; Convolution; Decoding; Vocoders; Psychoacoustic models; Training; Speech coding; Spectrogram; Generative adversarial networks; Distortion; Acoustic signal processing; audio coding; codecs; generative adversarial networks; residual neural networks;

D O I：

10.1109/JSTSP.2024.3469530

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of the SEANet-based codec does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, HILCodec, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.

引用

页码：1517 / 1530

页数：14

共 50 条

[21] GAN-based Augmentation for Populating Speech Dataset with High Fidelity Synthesized Audio
Back, Moon-Ki
Yoon, Seung-Won
Lee, Kyu-Chul
11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1267 - 1269
[22] High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder
Tewari, Ayush
Zollhofer, Michael
Bernard, Florian
Garrido, Pablo
Kim, Hyeongwoo
Perez, Patrick
Theobalt, Christian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 357 - 370
[23] Towards a Perceptual Loss: Using a Neural Network Codec Approximation as a Loss for Generative Audio Models
Ananthabhotla, Ishwarya
Ewert, Sebastian
Paradiso, Joseph A.
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1518 - 1525
[24] Bridging Global Context Interactions for High-Fidelity Pluralistic Image Completion
Zheng, Chuanxia
Song, Guoxian
Cham, Tat-Jen
Cai, Jianfei
Luo, Linjie
Dinh Phung
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8320 - 8333
[25] Spatial-Aware Texture Transformer for High-Fidelity Garment Transfer
Liu, Ting
Zhang, Jianfeng
Nie, Xuecheng
Wei, Yunchao
Wei, Shikui
Zhao, Yao
Feng, Jiashi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7499 - 7510
[26] High-fidelity, simulation-based microsurgical training for neurosurgical residents
Santyr, Brendan
Abbass, Mohamad
Chalil, Alan
Vivekanandan, Amirti
Krivosheya, Daria
Denning, Lynn M.
Mattingly, Thomas K.
Haji, Faizal A.
Lownie, Stephen P.
NEUROSURGICAL FOCUS, 2022, 53 (02)
[27] HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Su, Jiaqi
Jin, Zeyu
Finkelstein, Adam
INTERSPEECH 2020, 2020, : 4506 - 4510
[28] The use and effectiveness of high-fidelity simulation in health professions education: current update
Abdulhussain, Yasmin
Ghelani, Hardik
Henderson, Helen
Sudhir, Meghana
Mascarenhas, Sharon
Radhakrishnan, Rajan
Jan, Reem Kais
SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2022, 98 (12): : 1085 - 1095
[29] Do high-fidelity training models translate into better skill acquisition for an endourologist?
Cloutier, Jonathan
Traxer, Olivier
CURRENT OPINION IN UROLOGY, 2015, 25 (02) : 143 - 152
[30] A Versatile Ultrasound Simulation System for Education and Training in High-Fidelity Emergency Scenarios
Farsoni, Saverio
Astolfi, Luca
Bonfe, Marcello
Spadaro, Savino
Volta, Carlo Alberto
IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2017, 5

← 1 2 3 4 5 →