LEARNING CONTINUOUS REPRESENTATION OF AUDIO FOR ARBITRARY SCALE SUPER RESOLUTION

被引：4

作者：

Kim, Jaechang ^{[1
]}

Lee, Yunjoo ^{[1
]}

Hong, Seunghoon ^{[2
]}

Ok, Jungseul ^{[1
]}

机构：

[1] POSTECH, Grad Sch AI, Pohang, South Korea

[2] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

audio super resolution; speech super resolution; bandwidth extension; implicit neural networks;

D O I：

10.1109/ICASSP43922.2022.9746083

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Audio super resolution aims to predict the missing high resolution components of the low resolution audio signals. While audio in nature is a continuous signal, current approaches treat it as discrete data (i.e., input is defined on discrete time domain), and consider the super resolution over a fixed scale factor (i.e., it is required to train a new neural network to change output resolution). To obtain a continuous representation of audio and enable super resolution for arbitrary scale factor, we propose a method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA). Our method locally parameterizes a chunk of audio as a function of continuous time, and represents each chunk with the local latent codes of neighboring chunks so that the function can extrapolate the signal at any time coordinate, i.e., infinite resolution. To learn a continuous representation for audio, we design a self-supervised learning strategy to practice super resolution tasks up to the original resolution by stochastic selection. Our numerical evaluation shows that LISA outperforms the previous fixed-scale methods with a fraction of parameters, but also is capable of arbitrary scale super resolution even beyond the resolution of training data.

引用

页码：3703 / 3707

页数：5

共 20 条

[1]

Bachhav Pramod, 2018, ICASSP

[2]

Birnbaum S., 2019, NEURIPS

[3]

Chen Y., 2021, CVPR

[4] Adversarial Training for Speech Super-Resolution [J].

Eskimez, Sefik Emre ;

Koishida, Kazuhito ;

Duan, Zhiyao .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :347-358

[5] Design of digital systems for arbitrary sampling rate conversion [J].

Evangelista, G .

SIGNAL PROCESSING, 2003, 83 (02) :377-387

[6] Sample Rate Conversion Using B-Spline Interpolation for OFDM Based Software Defined Radios [J].

Huang, Xiaojing ;

Guo, Y. Jay ;

Zhang, Jian .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2012, 60 (08) :2113-2122

[7]

Jax P., 2003, ICASSP

[8]

Kingma D P., 2014, P INT C LEARN REPR

[9]

Kuleshov V., 2017, AUDIO SUPER RESOLUTI

[10]

Mehta Ishit, 2021, ARXIV210403960

← 1 2 →