Joint Chord and Key Estimation Based on a Hierarchical Variational Autoencoder with Multi-task Learning

被引:0
作者
Wu, Yiming [1 ]
Yoshii, Kazuyoshi [1 ,2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan
关键词
Automatic chord estimation; automatic key estimation; variational autoencoder; multi-task learning; AUDIO;
D O I
10.1561/116.00000052
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a deep generative approach to joint chord and key estimation for music signals. The limited amount of music signals with complete annotations has been the major bottleneck in supervised multi-task learning of a classification model. To overcome this limitation, we integrate the supervised multi-task learning approach with the unsupervised autoencoding approach in a mutually complementary manner. Considering the typical process of music composition, we formulate a hierarchical latent variable model that sequentially generates keys, chords, and chroma vectors. The keys and chords are assumed to follow a language model that represents their relationships and dynamics. In the framework of amortized variational inference (AVI), we introduce a classification model that jointly infers discrete chord and key labels and a recognition model that infers continuous latent features. These models are combined to form a variational autoencoder (VAE) and are trained jointly in a (semi-)supervised manner, where the generative and language models act as regularizers for the classification model. We comprehensively investigate three different architectures for the chord and key classification model, and three different architectures for the language model. Experimental results demonstrate that the VAE-based multi-task learning improves chord estimation as well as key estimation.
引用
收藏
页数:27
相关论文
共 37 条
[1]  
Ba JL., 2016, ARXIV
[2]  
Bock S., 2019, Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), P486
[3]  
Bock S., 2020, Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), P574
[4]  
Chai W., 2005, P INT C MUSIC INFORM, P468
[5]  
Chen R., 2012, 13 INT SOC MUSIC INF, P445
[6]  
Chen T.-P., 2019, P 20 INT SOC MUS INF, P259
[7]  
Cho T., 2014, THESIS NEW YORK U
[8]  
Choi K., 2019, 20th International Society for Music Information Retrieval Conference, P183
[9]  
di Giorgi Bruno., 2013, NDS 13 P 8 INT WORKS, P1
[10]  
Graves A, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P273, DOI 10.1109/ASRU.2013.6707742