DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS

被引:10
作者
Dong, Hao-Wen [1 ,2 ]
Zhou, Cong [1 ]
Berg-Kirkpatrick, Taylor [2 ]
McAuley, Julian [2 ]
机构
[1] Dolby Labs, London, England
[2] Univ Calif San Diego, La Jolla, CA 92093 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Audio synthesis; computer music; music information retrieval; machine learning; neural network;
D O I
10.1109/ICASSP43922.2022.9747217
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer-a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover, our proposed model significantly outperforms the baseline on an existing piano dataset in overall quality.
引用
收藏
页码:951 / 955
页数:5
相关论文
共 33 条
  • [1] [Anonymous], US
  • [2] [Anonymous], 2014, ISMIR LATE BREAKING
  • [3] [Anonymous], MUSESCORE GEN SOUNDF
  • [4] [Anonymous], 2017, ICML
  • [5] Chien C.-M., 2021, ICASSP
  • [6] Defossez A., 2018, NEURIPS
  • [7] Dong H.-W., 2020, ISMIR
  • [8] Engel, 2019, ICLR
  • [9] Engel J., 2020, P ICLR, P1
  • [10] Hawthorne C., 2019, ICLR, P1