Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration

被引:26
作者
Deng, Jun [1 ]
Schuller, Bjoern [1 ]
Eyben, Florian [1 ]
Schuller, Dagmar [1 ]
Zhang, Zixing [1 ]
Francois, Holly [2 ]
Oh, Eunmi [3 ]
机构
[1] AudEERING GmbH, Gilching, Germany
[2] Samsung Res UK, Staines, England
[3] Samsung Res, Seoul, South Korea
关键词
Audio restoration; LSTM; MP3; Deep learning; BANDWIDTH EXTENSION; TELEPHONE SPEECH;
D O I
10.1007/s00521-019-04158-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Perceptual audio coding is heavily and successfully applied for audio compression. However, perceptual audio coders may inject audible coding artifacts when encoding audio at low bitrates. Low-bitrate audio restoration is a challenging problem, which tries to recover a high-quality audio sample close to the uncompressed original from a low-quality encoded version. In this paper, we propose a novel data-driven method for audio restoration, where temporal and spectral dynamics are explicitly captured by a deep time-frequency-LSTM recurrent neural networks. Leveraging the captured temporal and spectral information can facilitate the task of learning a nonlinear mapping from the magnitude spectrogram of low-quality audio to that of high-quality audio. The proposed method substantially attenuates audible artifacts caused by codecs and is conceptually straightforward. Extensive experiments were carried out and the experimental results show that for low-bitrate audio at 96 kbps (mono), 64 kbps (mono), and 96 kbps (stereo), the proposed method can efficiently generate improved-quality audio that is competitive or even superior in perceptual quality to the audio produced by other state-of-the-art deep neural network methods and the LAME-MP3 codec.
引用
收藏
页码:1095 / 1107
页数:13
相关论文
共 62 条
[1]   Improving Hydrological Process Modeling Using Optimized Threshold-Based Wavelet De-Noising Technique [J].
Abbaszadeh, Peyman .
WATER RESOURCES MANAGEMENT, 2016, 30 (05) :1701-1721
[2]   An empirical technique for predicting noise exposure level in the typical embroidery workrooms using artificial neural networks [J].
Aliabadi, Mohsen ;
Golmohammadi, Rostam ;
Mansoorizadeh, Muharram ;
Khotanlou, Hassan ;
Hamadani, Abdoreza Ohadi .
APPLIED ACOUSTICS, 2013, 74 (03) :364-374
[3]  
Amodei D, 2016, PR MACH LEARN RES, V48
[4]  
[Anonymous], 2015, P 28 INF C NEUR INF
[5]  
[Anonymous], 2017, ABS170800853 CORR
[6]  
[Anonymous], 2014, Comput. Sci.
[7]  
[Anonymous], 2016, PROC 14 SOUND MUSIC
[8]  
[Anonymous], 2016, ARXIV160907132
[9]  
[Anonymous], ICML
[10]  
[Anonymous], 2017, AUDIO SUPER RESOLUTI