Unsupervised Interpretable Representation Learning for Singing Voice Separation

被引:0
|
作者
Mimilakis, Stylianos, I [1 ]
Drossos, Konstantinos [2 ]
Schuller, Gerald [3 ]
机构
[1] Fraunhofer IDMT, Semant Mus Techn Grp, Ilmenau, Germany
[2] Tampere Univ, Audio Res Grp, Tampere, Finland
[3] Tech Univ Ilmenau, Appl Media Syst Grp, Ilmenau, Germany
来源
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020) | 2021年
关键词
representation learning; unsupervised learning; denoising auto-encoders; singing voice separation; DENOISING AUTOENCODERS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation.
引用
收藏
页码:1412 / 1416
页数:5
相关论文
共 50 条
  • [1] Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation
    Yuan, Weitao
    Wang, Shengbei
    Wang, Jianming
    Unoki, Masashi
    Wang, Wenwu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3206 - 3220
  • [2] Unsupervised Singing Voice Detection Using Dictionary Learning
    Pikrakis, Aggelos
    Kopsinis, Yannis
    Kroher, Nadine
    Diaz-Banez, Jose-Miguel
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1212 - 1216
  • [3] High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation
    Bhuwan Bhattarai
    Yagya Raj Pandeya
    You Jie
    Arjun Kumar Lamichhane
    Joonwhoan Lee
    Circuits, Systems, and Signal Processing, 2023, 42 : 1083 - 1104
  • [4] High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation
    Bhattarai, Bhuwan
    Pandeya, Yagya Raj
    Jie, You
    Lamichhane, Arjun Kumar
    Lee, Joonwhoan
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (02) : 1083 - 1104
  • [5] Unsupervised Singing Voice Conversion
    Nachmani, Eliya
    Wolf, Lior
    INTERSPEECH 2019, 2019, : 2583 - 2587
  • [6] Informed Group-Sparse Representation for Singing Voice Separation
    Chan, Tak-Shing T.
    Yang, Yi-Hsuan
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (02) : 156 - 160
  • [7] Hierarchical disentangled representation learning for singing voice conversion
    Takahashi, Naoya
    Singh, Mayank Kumar
    Mitsufuji, Yuki
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Monophonic Singing Voice Separation Based on Deep Learning
    Wang, Yutian
    Zhang, Zhao
    Wang, Zheng
    Cai, JuanJuan
    Wang, Hui
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 491 - 495
  • [9] Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation
    Zhao, Tiancheng
    Lee, Kyusong
    Eskenazi, Maxine
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1098 - 1107
  • [10] PPG-BASED SINGING VOICE CONVERSION WITH ADVERSARIAL REPRESENTATION LEARNING
    Li, Zhonghao
    Tang, Benlai
    Yin, Xiang
    Wan, Yuan
    Xu, Ling
    Shen, Chen
    Ma, Zejun
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7073 - 7077