Unsupervised Interpretable Representation Learning for Singing Voice Separation

被引：0

作者：

Mimilakis, Stylianos, I ^{[1
]}

Drossos, Konstantinos ^{[2
]}

Schuller, Gerald ^{[3
]}

机构：

[1] Fraunhofer IDMT, Semant Mus Techn Grp, Ilmenau, Germany

[2] Tampere Univ, Audio Res Grp, Tampere, Finland

[3] Tech Univ Ilmenau, Appl Media Syst Grp, Ilmenau, Germany

来源：

28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020) | 2021年

关键词：

representation learning; unsupervised learning; denoising auto-encoders; singing voice separation; DENOISING AUTOENCODERS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation.

引用

页码：1412 / 1416

页数：5

共 50 条

[1] Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation
Yuan, Weitao
Wang, Shengbei
Wang, Jianming
Unoki, Masashi
Wang, Wenwu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3206 - 3220
[2] Unsupervised Singing Voice Detection Using Dictionary Learning
Pikrakis, Aggelos
Kopsinis, Yannis
Kroher, Nadine
Diaz-Banez, Jose-Miguel
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1212 - 1216
[3] High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation
Bhuwan Bhattarai
Yagya Raj Pandeya
You Jie
Arjun Kumar Lamichhane
Joonwhoan Lee
Circuits, Systems, and Signal Processing, 2023, 42 : 1083 - 1104
[4] High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation
Bhattarai, Bhuwan
Pandeya, Yagya Raj
Jie, You
Lamichhane, Arjun Kumar
Lee, Joonwhoan
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (02) : 1083 - 1104
[5] Unsupervised Singing Voice Conversion
Nachmani, Eliya
Wolf, Lior
INTERSPEECH 2019, 2019, : 2583 - 2587
[6] Informed Group-Sparse Representation for Singing Voice Separation
Chan, Tak-Shing T.
Yang, Yi-Hsuan
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (02) : 156 - 160
[7] Hierarchical disentangled representation learning for singing voice conversion
Takahashi, Naoya
Singh, Mayank Kumar
Mitsufuji, Yuki
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] Monophonic Singing Voice Separation Based on Deep Learning
Wang, Yutian
Zhang, Zhao
Wang, Zheng
Cai, JuanJuan
Wang, Hui
2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 491 - 495
[9] Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation
Zhao, Tiancheng
Lee, Kyusong
Eskenazi, Maxine
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1098 - 1107
[10] PPG-BASED SINGING VOICE CONVERSION WITH ADVERSARIAL REPRESENTATION LEARNING
Li, Zhonghao
Tang, Benlai
Yin, Xiang
Wan, Yuan
Xu, Ling
Shen, Chen
Ma, Zejun
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7073 - 7077

← 1 2 3 4 5 →