Exploring Audio Compression as Image Completion in Time-Frequency Domain

被引:0
|
作者
Scodeller, Giovanni [1 ]
Pistellato, Mara [1 ]
Bergamasco, Filippo [1 ]
机构
[1] Univ CaFoscari Venezia, DAIS, 155 Via Torino, Venice, Italy
来源
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II | 2023年 / 14234卷
关键词
Audio compression; CNN; Sparse convolutions; Spectrogram; genetic algorithm;
D O I
10.1007/978-3-031-43153-1_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio compression is usually achieved with algorithms that exploit spectral properties of the given signal such as frequency or temporal masking. In this paper we propose to tackle such a problem from a different point of view, considering the time-frequency domain of an audio signal as an intensity map to be reconstructed via a data-driven approach. The compression stage removes some selected input values from the time-frequency representation of the original signal. Then, decompression works by reconstructing the missing samples as an image completion task. Our method is divided into two main parts: first, we analyse the feasibility of a data-driven audio reconstruction with missing samples in its time-frequency representation. To do so, we exploit an existing CNN model designed for depth completion, involving a sequence of sparse convolutions to deal with absent values. Second, we propose a method to select the values to be removed at compression stage, maximizing the perceived audio quality of the decompressed signal. In the experimental section we validate the proposed technique on some standard audio datasets and provide an extensive study on the quality of the reconstructed signal under different conditions.
引用
收藏
页码:443 / 455
页数:13
相关论文
共 50 条
  • [21] AUDIO SOURCE SEPARATION WITH TIME-FREQUENCY VELOCITIES
    Wolf, Guy
    Mallat, Stephane
    Shamma, Shihab
    2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2014,
  • [22] AUDIO CLASSIFICATION FROM TIME-FREQUENCY TEXTURE
    Yu, Guoshen
    Slotine, Jean-Jacques
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1677 - +
  • [23] Audio watermarking using time-frequency characteristics
    Esmaili, S
    Krishnan, S
    Raahemifar, K
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2003, 28 (02): : 57 - 61
  • [24] Audio denoising by time-frequency block thresholding
    Yu, Guoshen
    Mallat, Stephane
    Bacry, Emmanuel
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (05) : 1830 - 1839
  • [25] ROBUST UNDERDETERMINED BLIND AUDIO SOURCE SEPARATION OF SPARSE SIGNALS IN THE TIME-FREQUENCY DOMAIN
    Sbai, Si Mohamed Aziz
    Aissa-El-Bey, Abdeldjalil
    Pastor, Dominique
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 3716 - 3719
  • [26] Robust Audio Information Hiding Based on Stereo Phase Difference in Time-frequency Domain
    Ono, Nobutaka
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 260 - 263
  • [27] JOINT TIME-FREQUENCY SCATTERING FOR AUDIO CLASSIFICATION
    Anden, Joakim
    Lostanlen, Vincent
    Mallat, Stephane
    2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2015,
  • [28] Persistent Time-Frequency Shrinkage for Audio Denoising
    Siedenburg, Kai
    Doerfler, Monika
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2013, 61 (1-2): : 29 - 38
  • [29] Classification of Time-Frequency Regions in Stereo Audio
    Harma, Aki
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2011, 59 (10): : 707 - 720
  • [30] Target identification in the time-frequency domain
    Jouny, I
    Karunaratne, P
    Amin, M
    AUTOMATIC OBJECT RECOGNITION VI, 1996, 2756 : 249 - 260