SELF-ATTENTION FOR AUDIO SUPER-RESOLUTION

被引：6

作者：

Rakotonirina, Nathanael Carraz ^{[1
]}

机构：

[1] Univ Antananarivo, Antananarivo, Madagascar

来源：

2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2021年

关键词：

audio super-resolution; bandwidth extension; self-attention; NARROW-BAND; SPEECH;

D O I：

10.1109/MLSP52302.2021.9596082

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.

引用

页数：6

共 44 条

[1] [Anonymous], 2016, arXiv
[2] Ba J., 2016, ARXIV160706450, V1050, P21
[3] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4] BANSAL D, 2005, 9 EUR C SPEECH COMM, P40801
[5] Bello I., 2017, ICLR
[6] Attention Augmented Convolutional Networks
Bello, Irwan
Zoph, Barret
Vaswani, Ashish
Shlens, Jonathon
Le, Quoc V.
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
[7] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
BENGIO, Y
SIMARD, P
FRASCONI, P
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
[8] Birnbaum Sawyer, 2019, NEURIPS, P10287
[9] Bradbury J, 2000, Linear predictive coding
[10] Statistical Recovery of Wideband Speech from Narrowband Speech
Cheng, Yan Ming
O'Shaughnessy, Douglas
Mermelstein, Paul
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04): : 544 - 548

← 1 2 3 4 5 →