SELF-ATTENTION FOR AUDIO SUPER-RESOLUTION

被引:6
作者
Rakotonirina, Nathanael Carraz [1 ]
机构
[1] Univ Antananarivo, Antananarivo, Madagascar
来源
2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2021年
关键词
audio super-resolution; bandwidth extension; self-attention; NARROW-BAND; SPEECH;
D O I
10.1109/MLSP52302.2021.9596082
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.
引用
收藏
页数:6
相关论文
共 44 条
  • [1] [Anonymous], 2016, arXiv
  • [2] Ba J., 2016, ARXIV160706450, V1050, P21
  • [3] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [4] BANSAL D, 2005, 9 EUR C SPEECH COMM, P40801
  • [5] Bello I., 2017, ICLR
  • [6] Attention Augmented Convolutional Networks
    Bello, Irwan
    Zoph, Barret
    Vaswani, Ashish
    Shlens, Jonathon
    Le, Quoc V.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
  • [7] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [8] Birnbaum Sawyer, 2019, NEURIPS, P10287
  • [9] Bradbury J, 2000, Linear predictive coding
  • [10] Statistical Recovery of Wideband Speech from Narrowband Speech
    Cheng, Yan Ming
    O'Shaughnessy, Douglas
    Mermelstein, Paul
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04): : 544 - 548