Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration

被引：26

作者：

Deng, Jun ^{[1
]}

Schuller, Bjoern ^{[1
]}

Eyben, Florian ^{[1
]}

Schuller, Dagmar ^{[1
]}

Zhang, Zixing ^{[1
]}

Francois, Holly ^{[2
]}

Oh, Eunmi ^{[3
]}

机构：

[1] AudEERING GmbH, Gilching, Germany

[2] Samsung Res UK, Staines, England

[3] Samsung Res, Seoul, South Korea

来源：

NEURAL COMPUTING & APPLICATIONS | 2020年 / 32卷 / 04期

关键词：

Audio restoration; LSTM; MP3; Deep learning; BANDWIDTH EXTENSION; TELEPHONE SPEECH;

D O I：

10.1007/s00521-019-04158-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Perceptual audio coding is heavily and successfully applied for audio compression. However, perceptual audio coders may inject audible coding artifacts when encoding audio at low bitrates. Low-bitrate audio restoration is a challenging problem, which tries to recover a high-quality audio sample close to the uncompressed original from a low-quality encoded version. In this paper, we propose a novel data-driven method for audio restoration, where temporal and spectral dynamics are explicitly captured by a deep time-frequency-LSTM recurrent neural networks. Leveraging the captured temporal and spectral information can facilitate the task of learning a nonlinear mapping from the magnitude spectrogram of low-quality audio to that of high-quality audio. The proposed method substantially attenuates audible artifacts caused by codecs and is conceptually straightforward. Extensive experiments were carried out and the experimental results show that for low-bitrate audio at 96 kbps (mono), 64 kbps (mono), and 96 kbps (stereo), the proposed method can efficiently generate improved-quality audio that is competitive or even superior in perceptual quality to the audio produced by other state-of-the-art deep neural network methods and the LAME-MP3 codec.

引用

页码：1095 / 1107

页数：13

共 62 条

[1] Improving Hydrological Process Modeling Using Optimized Threshold-Based Wavelet De-Noising Technique [J].

Abbaszadeh, Peyman .

WATER RESOURCES MANAGEMENT, 2016, 30 (05) :1701-1721

[2] An empirical technique for predicting noise exposure level in the typical embroidery workrooms using artificial neural networks [J].

Aliabadi, Mohsen ;

Golmohammadi, Rostam ;

Mansoorizadeh, Muharram ;

Khotanlou, Hassan ;

Hamadani, Abdoreza Ohadi .

APPLIED ACOUSTICS, 2013, 74 (03) :364-374

[3]

Amodei D, 2016, PR MACH LEARN RES, V48

[4]

[Anonymous], 2015, P 28 INF C NEUR INF

[5]

[Anonymous], 2017, ABS170800853 CORR

[6]

[Anonymous], 2014, Comput. Sci.

[7]

[Anonymous], 2016, PROC 14 SOUND MUSIC

[8]

[Anonymous], 2016, ARXIV160907132

[9]

[Anonymous], ICML

[10]

[Anonymous], 2017, AUDIO SUPER RESOLUTI

← 1 2 3 4 5 6 7 →