Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

被引:7
作者
Yang, Ning [1 ]
Usman, Muhammad [2 ]
He, Xiangjian [2 ]
Jan, Mian Ahmad [3 ]
Zhang, Liming [4 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
[2] Univ Technol Sydney, Sch Elect & Data Engn, Ultimo, NSW 2007, Australia
[3] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, Pakistan
[4] Univ Macau, Sch Comp Sci, Zhuhai 999078, Peoples R China
来源
IEEE ACCESS | 2017年 / 5卷
关键词
Blind Source Separation; Short Time Fourier Transform; OverLap-Add; SIR; SDR; BLIND SOURCE SEPARATION; FOURIER-TRANSFORM; BAYESIAN NONPARAMETRICS; SPEECH ENHANCEMENT; MIXTURES; SIGNALS; MODEL; STFT; DECOMPOSITION; FACTORIZATION;
D O I
10.1109/ACCESS.2017.2761741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.
引用
收藏
页码:27114 / 27125
页数:12
相关论文
共 50 条
  • [21] Environmental Sound Recognition With Time-Frequency Audio Features
    Chu, Selina
    Narayanan, Shrikanth
    Kuo, C. -C. Jay
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1142 - 1158
  • [22] SPARSE DENOISING OF AUDIO BY GREEDY TIME-FREQUENCY SHRINKAGE
    Bhattacharya, Gautam
    Depalle, Philippe
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [23] An Implementation Approach For Ideal Time-Frequency Distribution
    Zhang, Liming
    Qian, Tao
    2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2014, : 114 - 118
  • [24] A combined approach using subspace and beamforming methods for time-frequency domain blind source separation
    Ichijo, Akihiro
    Hamada, Takehiro
    Tabaru, Tetsuya
    Nakano, Kazushi
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 1437 - +
  • [25] Audio Fingerprint Extraction Based on Time-Frequency Domain
    Liu, Zhengzheng
    Li, Cong
    Cao, Sanxing
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1975 - 1979
  • [26] Underdetermined blind separation of nondisjoint sources in the time-frequency domain
    Aissa-El-Bey, Abdeldjalil
    Linh-Trung, Nguyen
    Abed-Meraim, Karim
    Belouchrani, Adel
    Grenier, Yves
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (03) : 897 - 907
  • [27] Underdetermined source separation of EEG signals in the time-frequency domain
    Shan, Zeyong
    Swary, Jacob
    Aviyente, Selin
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3637 - 3640
  • [28] Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking
    Naqvi, S. Mohsen
    Wang, W.
    Khan, M. Salman
    Barnard, M.
    Chambers, J. A.
    IET SIGNAL PROCESSING, 2012, 6 (05) : 466 - 477
  • [29] Blind separation of frequency-hopping signals based on time-frequency distribution
    Feng T.
    Yuan C.-W.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2010, 32 (05): : 900 - 903
  • [30] A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources
    Abrard, F
    Deville, Y
    SIGNAL PROCESSING, 2005, 85 (07) : 1389 - 1403