Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

被引:7
作者
Yang, Ning [1 ]
Usman, Muhammad [2 ]
He, Xiangjian [2 ]
Jan, Mian Ahmad [3 ]
Zhang, Liming [4 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
[2] Univ Technol Sydney, Sch Elect & Data Engn, Ultimo, NSW 2007, Australia
[3] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, Pakistan
[4] Univ Macau, Sch Comp Sci, Zhuhai 999078, Peoples R China
来源
IEEE ACCESS | 2017年 / 5卷
关键词
Blind Source Separation; Short Time Fourier Transform; OverLap-Add; SIR; SDR; BLIND SOURCE SEPARATION; FOURIER-TRANSFORM; BAYESIAN NONPARAMETRICS; SPEECH ENHANCEMENT; MIXTURES; SIGNALS; MODEL; STFT; DECOMPOSITION; FACTORIZATION;
D O I
10.1109/ACCESS.2017.2761741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.
引用
收藏
页码:27114 / 27125
页数:12
相关论文
共 50 条
  • [31] ONLINE BLIND SOURCE SEPARATION BASED ON TIME-FREQUENCY SPARSENESS
    Loesch, Benedikt
    Yang, Bin
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 117 - 120
  • [32] A Time-Frequency Domain Underdetermined Blind Source Separation Algorithm for MIMO Radar Signals
    Guo, Qiang
    Ruan, Guoqing
    Liao, Yanping
    SYMMETRY-BASEL, 2017, 9 (07):
  • [33] Separation of Cardiorespiratory Sounds Using Time-Frequency Masking and Sparsity
    Shah, Ghafoor
    Papadias, Constantinos
    2013 18TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2013,
  • [34] Histogram of Gradients of Time-Frequency Representations for Audio Scene Classification
    Rakotomamonjy, Alain
    Gasso, Gilles
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 142 - 153
  • [35] Blind separation of coherent multipath signals with impulsive interference and Gaussian noise in time-frequency domain
    Xiao, Yiming
    Lu, Wenzhen
    Yan, Qinmengying
    Zhang, Haijian
    SIGNAL PROCESSING, 2021, 178
  • [36] Initialization method for speech separation algorithms that work in the time-frequency domain
    Sarmiento, Auxiliadora
    Duran-Diaz, Ivan
    Cruces, Sergio
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (04) : EL121 - EL126
  • [37] A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter
    Ito, Nobutaka
    Ikeshita, Rintaro
    Sawada, Hiroshi
    Nakatani, Tomohiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1950 - 1965
  • [38] A sparse time-frequency reconstruction approach from the synchroextracting domain
    Chen, Xuping
    Chen, Hui
    Hu, Ying
    Xie, Yutao
    Wang, Siyuan
    SIGNAL PROCESSING, 2024, 222
  • [39] TIME-FREQUENCY CLUSTERING WITH WEIGHTED AND CONTEXTUAL INFORMATION FOR CONVOLUTIVE BLIND SOURCE SEPARATION
    Jafari, Ingrid
    Atcheson, Matt
    Togneri, Roberto
    Nordholm, Sven
    2014 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), 2014, : 157 - 160
  • [40] Nonorthogonal joint diagonalization of spatial quadratic time-frequency matrices for source separation
    Giulieri, L
    Ghennioui, H
    Thirion-Moreau, N
    Moreau, E
    IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (05) : 415 - 418