机构:
Google LLC, San Francisco, CA 94105 USAGoogle LLC, San Francisco, CA 94105 USA
Skoglund, Jan
[1
]
Kleijn, W. Bastiaan
论文数: 0引用数: 0
h-index: 0
机构:
Google LLC, San Francisco, CA 94105 USA
Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New ZealandGoogle LLC, San Francisco, CA 94105 USA
Kleijn, W. Bastiaan
[1
,3
]
Storus, Andrew
论文数: 0引用数: 0
h-index: 0
机构:
Google LLC, San Francisco, CA 94105 USAGoogle LLC, San Francisco, CA 94105 USA
Storus, Andrew
[1
]
Yeh, Hengchin
论文数: 0引用数: 0
h-index: 0
机构:
Google LLC, San Francisco, CA 94105 USAGoogle LLC, San Francisco, CA 94105 USA
Yeh, Hengchin
[1
]
机构:
[1] Google LLC, San Francisco, CA 94105 USA
[2] Yonsei Univ, Elect & Elect Engn, Seoul, South Korea
[3] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
来源:
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA
|
2023年
In this paper, we propose a high-rate extension of the SoundStream codec which is able to generate almost transparent quality audio at 16 kbps for wideband speech signals. SoundStream shows reasonably good performance at low bit-rates (e.g. around 9 kbps), but its performance does not improve much when more bits are used for encoding the latent embeddings. Motivated by experimental results showing that neural audio codec performance is highly related to the characteristics of latent embeddings such as dimensionality, dependency, and probability density function shape, we propose a convolutional transformer architecture and an attention-based multi-scale latent decomposition method that significantly enhances codec performance when quantizing high-dimensional embeddings. Experimental results show the superiority of our proposed model over conventional approaches.