Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control

被引：0

作者：

Koguchi, Junya ^{[1
]}

Morise, Masanori ^{[1
]}

机构：

[1] Meiji Univ, Grad Sch Adv Math Sci, 4-21-1 Nakano, Tokyo 1648525, Japan

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2024年 / 2024卷 / 01期

基金：

日本学术振兴会;

关键词：

Musical instrument sound synthesis; Playing technique; Electric bass guitar; Phoneme; Deep neural networks;

D O I：

10.1186/s13636-024-00327-9

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such as electric bass guitar (EBG), shares acoustical similarities with speech. We propose an attack-sustain (AS) representation of the playing technique to take advantage of this similarity. The AS representation treats the attack segment as an unvoiced consonant and the sustain segment as a voiced vowel. In addition, we propose a MISS framework for an EBG that can control its playing techniques: (1) we constructed a EBG sound database containing a rich set of playing techniques, (2) we developed a dynamic time warping and timbre conversion to align the sounds and AS labels, (3) we extend an existing MISS framework to control playing techniques using AS representation as control symbols. The experimental evaluation suggests that our AS representation effectively controls the playing techniques and improves the naturalness of the synthetic sound.

引用

页数：10

共 37 条

[1] FEATURE-BASED EXTRACTION OF PLUCKING AND EXPRESSION STYLES OF THE ELECTRIC BASS GUITAR
Abesser, Jakob
Lukashevich, Hanna
Schuller, Gerald
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2290 - 2293
[2] [Anonymous], 2003, P INT SOC MUS INF RE
[3] Bilbao Stefan, 2019, Computer Music Journal, V43, P15, DOI 10.1162/comj_a_00516
[4] AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS
BRUGNARA, F
FALAVIGNA, D
OMOLOGO, M
[J]. SPEECH COMMUNICATION, 1993, 12 (04) : 357 - 370
[5] Cooper E., 2021, P 11 ISCA SPEECH SYN, P130, DOI [10.21437/ssw.2021-23, DOI 10.21437/SSW.2021-23]
[6] DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS
Dong, Hao-Wen
Zhou, Cong
Berg-Kirkpatrick, Taylor
McAuley, Julian
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 951 - 955
[7] Engel J, 2017, PR MACH LEARN RES, V70
[8] Fant G., 1970, Acoustic Theory of Speech Production with Calculations Based on X-ray Studies of Russian Articulations, Vsecond
[9] Fender Custom Shop, 1962, jazz bass
[10] Fujimoto T., 2019, 10 ISCA SPEECH SYNTH, DOI [10.21437/SSW.2019-30, DOI 10.21437/SSW.2019-30]

← 1 2 3 4 →