SELF-ATTENTION FOR AUDIO SUPER-RESOLUTION

被引：6

作者：

Rakotonirina, Nathanael Carraz ^{[1
]}

机构：

[1] Univ Antananarivo, Antananarivo, Madagascar

来源：

2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2021年

关键词：

audio super-resolution; bandwidth extension; self-attention; NARROW-BAND; SPEECH;

D O I：

10.1109/MLSP52302.2021.9596082

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.

引用

页数：6

共 44 条

[11] Damianou A., 2020, ARXIV PREPRINT ARXIV
[12] Dong J, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), P604, DOI 10.1109/ICDSP.2015.7251945
[13] Ekstrand P., 2002, P 1 IEEE BEN WORKSH
[14] FINDING STRUCTURE IN TIME
ELMAN, JL
[J]. COGNITIVE SCIENCE, 1990, 14 (02) : 179 - 211
[15] Eskimez SE, 2019, INT CONF ACOUST SPEE, P3717, DOI [10.1109/icassp.2019.8682215, 10.1109/ICASSP.2019.8682215]
[16] Giri R, 2019, IEEE WORK APPL SIG, P249, DOI [10.1109/waspaa.2019.8937186, 10.1109/WASPAA.2019.8937186]
[17] Goodfellow IJ, 2015, 3 INT C LEARN REPR I
[18] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[19] DISTANCE MEASURES FOR SPEECH PROCESSING
GRAY, AH
MARKEL, JD
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (05): : 380 - 391
[20] Conformer: Convolution-augmented Transformer for Speech Recognition
Gulati, Anmol
Qin, James
Chiu, Chung-Cheng
Parmar, Niki
Zhang, Yu
Yu, Jiahui
Han, Wei
Wang, Shibo
Zhang, Zhengdong
Wu, Yonghui
Pang, Ruoming
[J]. INTERSPEECH 2020, 2020, : 5036 - 5040

← 1 2 3 4 5 →