SELF-ATTENTION FOR AUDIO SUPER-RESOLUTION

被引:6
作者
Rakotonirina, Nathanael Carraz [1 ]
机构
[1] Univ Antananarivo, Antananarivo, Madagascar
来源
2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2021年
关键词
audio super-resolution; bandwidth extension; self-attention; NARROW-BAND; SPEECH;
D O I
10.1109/MLSP52302.2021.9596082
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.
引用
收藏
页数:6
相关论文
共 44 条
  • [11] Damianou A., 2020, ARXIV PREPRINT ARXIV
  • [12] Dong J, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), P604, DOI 10.1109/ICDSP.2015.7251945
  • [13] Ekstrand P., 2002, P 1 IEEE BEN WORKSH
  • [14] FINDING STRUCTURE IN TIME
    ELMAN, JL
    [J]. COGNITIVE SCIENCE, 1990, 14 (02) : 179 - 211
  • [15] Eskimez SE, 2019, INT CONF ACOUST SPEE, P3717, DOI [10.1109/icassp.2019.8682215, 10.1109/ICASSP.2019.8682215]
  • [16] Giri R, 2019, IEEE WORK APPL SIG, P249, DOI [10.1109/waspaa.2019.8937186, 10.1109/WASPAA.2019.8937186]
  • [17] Goodfellow IJ, 2015, 3 INT C LEARN REPR I
  • [18] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [19] DISTANCE MEASURES FOR SPEECH PROCESSING
    GRAY, AH
    MARKEL, JD
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (05): : 380 - 391
  • [20] Conformer: Convolution-augmented Transformer for Speech Recognition
    Gulati, Anmol
    Qin, James
    Chiu, Chung-Cheng
    Parmar, Niki
    Zhang, Yu
    Yu, Jiahui
    Han, Wei
    Wang, Shibo
    Zhang, Zhengdong
    Wu, Yonghui
    Pang, Ruoming
    [J]. INTERSPEECH 2020, 2020, : 5036 - 5040