Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control

被引:0
作者
Koguchi, Junya [1 ]
Morise, Masanori [1 ]
机构
[1] Meiji Univ, Grad Sch Adv Math Sci, 4-21-1 Nakano, Tokyo 1648525, Japan
基金
日本学术振兴会;
关键词
Musical instrument sound synthesis; Playing technique; Electric bass guitar; Phoneme; Deep neural networks;
D O I
10.1186/s13636-024-00327-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such as electric bass guitar (EBG), shares acoustical similarities with speech. We propose an attack-sustain (AS) representation of the playing technique to take advantage of this similarity. The AS representation treats the attack segment as an unvoiced consonant and the sustain segment as a voiced vowel. In addition, we propose a MISS framework for an EBG that can control its playing techniques: (1) we constructed a EBG sound database containing a rich set of playing techniques, (2) we developed a dynamic time warping and timbre conversion to align the sounds and AS labels, (3) we extend an existing MISS framework to control playing techniques using AS representation as control symbols. The experimental evaluation suggests that our AS representation effectively controls the playing techniques and improves the naturalness of the synthetic sound.
引用
收藏
页数:10
相关论文
共 37 条
  • [1] FEATURE-BASED EXTRACTION OF PLUCKING AND EXPRESSION STYLES OF THE ELECTRIC BASS GUITAR
    Abesser, Jakob
    Lukashevich, Hanna
    Schuller, Gerald
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2290 - 2293
  • [2] [Anonymous], 2003, P INT SOC MUS INF RE
  • [3] Bilbao Stefan, 2019, Computer Music Journal, V43, P15, DOI 10.1162/comj_a_00516
  • [4] AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS
    BRUGNARA, F
    FALAVIGNA, D
    OMOLOGO, M
    [J]. SPEECH COMMUNICATION, 1993, 12 (04) : 357 - 370
  • [5] Cooper E., 2021, P 11 ISCA SPEECH SYN, P130, DOI [10.21437/ssw.2021-23, DOI 10.21437/SSW.2021-23]
  • [6] DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS
    Dong, Hao-Wen
    Zhou, Cong
    Berg-Kirkpatrick, Taylor
    McAuley, Julian
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 951 - 955
  • [7] Engel J, 2017, PR MACH LEARN RES, V70
  • [8] Fant G., 1970, Acoustic Theory of Speech Production with Calculations Based on X-ray Studies of Russian Articulations, Vsecond
  • [9] Fender Custom Shop, 1962, jazz bass
  • [10] Fujimoto T., 2019, 10 ISCA SPEECH SYNTH, DOI [10.21437/SSW.2019-30, DOI 10.21437/SSW.2019-30]