It's Raw! Audio Generation with State-Space Models

被引:0
|
作者
Goel, Karan [1 ]
Gu, Albert [1 ]
Donahue, Chris [1 ]
Re, Christopher [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SASHIMI, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. We identify that S4 can be unstable during autoregressive generation, and provide a simple improvement to its parameterization by drawing connections to Hurwitz matrices. SASHIMI yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting. Additionally, SASHIMI improves non-autoregressive generation performance when used as the backbone architecture for a diffusion model. Compared to prior architectures in the autoregressive generation setting, SASHIMI generates piano and speech waveforms which humans find more musical and coherent respectively, e.g. 2x better mean opinion scores than WaveNet on an unconditional speech generation task.1 On a music generation task, SASHIMI outperforms WaveNet on density estimation and speed at both training and inference even when using 3x fewer parameters.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] STATE-SPACE GENERATION WITH INDUCTION
    VALMARI, A
    SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE - 89, 1989, : 99 - 115
  • [2] Discriminative State-Space Models
    Kuznetsov, Vitaly
    Mohri, Mehryar
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [3] Dynamic state-space models
    Guo, WS
    JOURNAL OF TIME SERIES ANALYSIS, 2003, 24 (02) : 149 - 158
  • [4] Structured state-space models are deep Wiener modelsStructured state-space models are deep Wiener models
    Bonassi, Fabio
    Andersson, Carl
    Mattsson, Per
    Schon, Thomas B.
    IFAC PAPERSONLINE, 2024, 58 (15): : 247 - 252
  • [5] State-space models of pipelines
    Geiger, Gerhard
    Marko, Drago
    PROCEEDINGS OF THE 17TH IASTED INTERNATIONAL CONFERENCE ON MODELLING AND SIMULATION, 2006, : 56 - +
  • [6] Granger causality for state-space models
    Barnett, Lionel
    Seth, Anil K.
    PHYSICAL REVIEW E, 2015, 91 (04):
  • [7] ON GIBBS SAMPLING FOR STATE-SPACE MODELS
    CARTER, CK
    KOHN, R
    BIOMETRIKA, 1994, 81 (03) : 541 - 553
  • [8] State-space models for optical imaging
    Myers, Kary L.
    Brockwell, Anthony E.
    Eddy, William F.
    STATISTICS IN MEDICINE, 2007, 26 (21) : 3862 - 3874
  • [9] State-Space Models for Control and Identification
    2005, Springer Verlag (308):
  • [10] DERIVATION OF STATE-SPACE MODELS OF CRYSTALLIZERS
    DEWOLF, S
    JAGER, J
    VISSER, B
    KRAMER, HJM
    BOSGRA, OH
    ACS SYMPOSIUM SERIES, 1990, 438 : 144 - 158