It's Raw! Audio Generation with State-Space Models

被引：0

作者：

Goel, Karan ^{[1
]}

Gu, Albert ^{[1
]}

Donahue, Chris ^{[1
]}

Re, Christopher ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Developing architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SASHIMI, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. We identify that S4 can be unstable during autoregressive generation, and provide a simple improvement to its parameterization by drawing connections to Hurwitz matrices. SASHIMI yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting. Additionally, SASHIMI improves non-autoregressive generation performance when used as the backbone architecture for a diffusion model. Compared to prior architectures in the autoregressive generation setting, SASHIMI generates piano and speech waveforms which humans find more musical and coherent respectively, e.g. 2x better mean opinion scores than WaveNet on an unconditional speech generation task.1 On a music generation task, SASHIMI outperforms WaveNet on density estimation and speed at both training and inference even when using 3x fewer parameters.

引用

页数：18

共 50 条

[1] STATE-SPACE GENERATION WITH INDUCTION
VALMARI, A
SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE - 89, 1989, : 99 - 115
[2] Discriminative State-Space Models
Kuznetsov, Vitaly
Mohri, Mehryar
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[3] Dynamic state-space models
Guo, WS
JOURNAL OF TIME SERIES ANALYSIS, 2003, 24 (02) : 149 - 158
[4] Structured state-space models are deep Wiener modelsStructured state-space models are deep Wiener models
Bonassi, Fabio
Andersson, Carl
Mattsson, Per
Schon, Thomas B.
IFAC PAPERSONLINE, 2024, 58 (15): : 247 - 252
[5] State-space models of pipelines
Geiger, Gerhard
Marko, Drago
PROCEEDINGS OF THE 17TH IASTED INTERNATIONAL CONFERENCE ON MODELLING AND SIMULATION, 2006, : 56 - +
[6] Granger causality for state-space models
Barnett, Lionel
Seth, Anil K.
PHYSICAL REVIEW E, 2015, 91 (04):
[7] ON GIBBS SAMPLING FOR STATE-SPACE MODELS
CARTER, CK
KOHN, R
BIOMETRIKA, 1994, 81 (03) : 541 - 553
[8] State-space models for optical imaging
Myers, Kary L.
Brockwell, Anthony E.
Eddy, William F.
STATISTICS IN MEDICINE, 2007, 26 (21) : 3862 - 3874
[9] State-Space Models for Control and Identification
2005, Springer Verlag (308):
[10] DERIVATION OF STATE-SPACE MODELS OF CRYSTALLIZERS
DEWOLF, S
JAGER, J
VISSER, B
KRAMER, HJM
BOSGRA, OH
ACS SYMPOSIUM SERIES, 1990, 438 : 144 - 158

← 1 2 3 4 5 →