Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

被引:7
|
作者
Yang, Ning [1 ]
Usman, Muhammad [2 ]
He, Xiangjian [2 ]
Jan, Mian Ahmad [3 ]
Zhang, Liming [4 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
[2] Univ Technol Sydney, Sch Elect & Data Engn, Ultimo, NSW 2007, Australia
[3] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, Pakistan
[4] Univ Macau, Sch Comp Sci, Zhuhai 999078, Peoples R China
来源
IEEE ACCESS | 2017年 / 5卷
关键词
Blind Source Separation; Short Time Fourier Transform; OverLap-Add; SIR; SDR; BLIND SOURCE SEPARATION; FOURIER-TRANSFORM; BAYESIAN NONPARAMETRICS; SPEECH ENHANCEMENT; MIXTURES; SIGNALS; MODEL; STFT; DECOMPOSITION; FACTORIZATION;
D O I
10.1109/ACCESS.2017.2761741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.
引用
收藏
页码:27114 / 27125
页数:12
相关论文
共 50 条
  • [1] Audio source separation with multiple microphones on time-frequency representations
    Sawada, Hiroshi
    INDEPENDENT COMPONENT ANALYSES, COMPRESSIVE SAMPLING, WAVELETS, NEURAL NET, BIOSYSTEMS, AND NANOENGINEERING XI, 2013, 8750
  • [2] Improving time-frequency sparsity for enhanced audio source separation in degenerate unmixing estimation technique algorithm
    Abdulla, Shahin M.
    Jayakumari, J.
    JOURNAL OF CONTROL AND DECISION, 2022, 9 (04) : 502 - 515
  • [3] Adaptive Time-Frequency Analysis for Noise Reduction in an Audio Filter Bank With Low Delay
    Andersen, Kristian Timm
    Moonen, Marc
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 784 - 795
  • [4] A NEW TIME-FREQUENCY APPROACH FOR UNDERDETERMINED CONVOLUTIVE BLIND SPEECH SEPARATION
    Bouafif, Mariem
    Lachiri, Zied
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 3226 - 3230
  • [5] Exploiting Time-Frequency Conformers for Music Audio Enhancement
    Chae, Yunkee
    Koo, Junghyun
    Lee, Sungho
    Lee, Kyogu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2362 - 2370
  • [6] ROBUST UNDERDETERMINED BLIND AUDIO SOURCE SEPARATION OF SPARSE SIGNALS IN THE TIME-FREQUENCY DOMAIN
    Sbai, Si Mohamed Aziz
    Aissa-El-Bey, Abdeldjalil
    Pastor, Dominique
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 3716 - 3719
  • [7] Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming
    Bella, Mostafa
    Saylani, Hicham
    Hosseini, Shahram
    Deville, Yannick
    IEEE ACCESS, 2023, 11 : 100632 - 100645
  • [8] Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Barnard, Mark
    Kittler, Josef
    Chambers, Jonathon
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5520 - 5535
  • [9] Time-Frequency Approach to Underdetermined Blind Source Separation
    Xie, Shengli
    Yang, Liu
    Yang, Jun-Mei
    Zhou, Guoxu
    Xiang, Yong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (02) : 306 - 316
  • [10] Evaluations on underdetermined blind source separation in adverse environments using time-frequency masking
    Jafari, Ingrid
    Haque, Serajul
    Togneri, Roberto
    Nordholm, Sven
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2013,