Expressive control of singing voice synthesis using musical contexts and a parametric F0 model

被引:4
|
作者
Ardaillon, Luc [1 ]
Chabot-Canet, Celine [1 ]
Roebel, Axel [1 ]
机构
[1] Sorbonne Univ, CNRS, IRCAM, UMR,STMS, Paris, France
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
singing voice synthesis; singing style; F0; model;
D O I
10.21437/Interspeech.2016-1317
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Expressive singing voice synthesis requires an appropriate control of both prosodic and timbral aspects. While it is desirable to have an intuitive control over the expressive parameters, synthesis systems should be able to produce convincing results directly from a score. As countless interpretations of a same score are possible, the system should also target a particular singing style, which implies to mimic the various strategies used by different singers. Among the control parameters involved, the pitch (F0) should be modeled in priority. In previous work, a parametric F0 model with intuitive controls has been proposed, but no automatic way to choose the model parameters was given. In the present work, we propose a new approach for modeling singing style, based on parametric templates selection. In this approach, the F0 parameters and phonemes durations are extracted from annotated recordings, along with a rich description of contextual informations, and stored to form a database of parametric templates. This database is then used to build a model of the singing style using decision-trees. At the synthesis stage, appropriate parameters are then selected according to the target contexts. The results produced by this approach have been evaluated by means of a listening test.
引用
收藏
页码:1250 / 1254
页数:5
相关论文
共 9 条
  • [1] Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis
    Saitou, T
    Unoki, M
    Akagi, M
    SPEECH COMMUNICATION, 2005, 46 (3-4) : 405 - 417
  • [2] A multi-layer F0 model for singing voice synthesis using a B-spline representation with intuitive controls
    Ardaillon, Luc
    Degottex, Gilles
    Roebel, Axel
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3375 - +
  • [3] Parameter Estimation Method of F0 Control Model for Singing Voices
    Ohishi, Yasunori
    Kameoka, Hirokazu
    Kashino, Kunio
    Takeda, Kazuya
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 139 - +
  • [4] Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet
    Wada, Yusuke
    Nishikimi, Ryo
    Nakamura, Eita
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 983 - 989
  • [5] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Jinfu Ni
    Yoshinori Shiga
    Chiori Hori
    Journal of Signal Processing Systems, 2016, 82 : 273 - 286
  • [6] Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora:: application to emotional speech synthesis
    Hirose, K
    Sato, K
    Asano, Y
    Minematsu, N
    SPEECH COMMUNICATION, 2005, 46 (3-4) : 385 - 404
  • [7] An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1059 - 1063
  • [8] Voice quality control using perceptual expressions for statistical parametric speech synthesis based on cluster adaptive training
    Ohtani, Yamato
    Mori, Koichiro
    Morita, Masahiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2258 - 2262
  • [9] EfficientSing: A Chinese Singing Voice Synthesis System Using Duration-Free Acoustic Model and HiFi-GAN Vocoder
    Liu, Zhengchen
    Miao, Chenfeng
    Zhu, Qingying
    Chen, Minchuan
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2021, 2021, : 1609 - 1613