A novel approach to the estimation of voice source and vocal tract parameters from speech signals

被引:0
|
作者
Ding, W
Kasuya, H
机构
来源
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 | 1996年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel adaptive pitch-synchronous analysis method for simultaneous estimation of voice source and vocal tract (formant/antiformant) parameters from the speech signal. The method uses a parametric Rosenberg-Klatt model to generate a glottal waveform and an autoregressive with exogenous input (ARX) model for representing speech production process. The time-varying coefficients of the model are estimated with an adaptive algorithm based on Kalman filter, while the parameters of the Rosenberg-Klatt model are optimized using the simulated annealing method. In addition, a new hybrid error criterion is used to optimize the glottal opening instant. Furthermore, in order to estimate the fundamental period parameter To, it is defined as two successive glottal closure instants, and is estimated automatically based on the obtained differentiated glottal waveform. Experiments using two-channel speech signals (speech and electroglottograph (EGG) signal) and continuous speech show a good estimation performance.
引用
收藏
页码:1257 / 1260
页数:4
相关论文
共 50 条
  • [1] Fast and robust joint estimation of vocal tract and voice source parameters
    Ding, W
    Campbell, N
    Higuchi, N
    Kasuya, H
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1291 - 1294
  • [2] SIMULTANEOUS ESTIMATION OF VOCAL-TRACT AND VOICE SOURCE PARAMETERS BASED ON AN ARX MODEL
    DING, W
    KASUYA, H
    ADACHI, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 738 - 743
  • [3] Estimation of glottal source waveforms and vocal tract shapes from speech signals based on ARX-LF model
    Li, Yongwei
    Sakakibara, Ken-Ichi
    Akagi, Masato
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 230 - 234
  • [4] ESTIMATION OF VOCAL TRACT PARAMETERS FOR THE CLASSIFICATION OF SPEECH UNDER STRESS
    Yao, Xiao
    Jitsuhiro, Takatoshi
    Miyajima, Chiyomi
    Kitaoka, Norihide
    Takeda, Kazuya
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7532 - 7536
  • [5] A Precise Estimation of Vocal Tract Parameters for High Quality Voice Morphing
    Xu, Ning
    Yang, Zhen
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 684 - 687
  • [6] Simultaneous Estimation of Glottal Source Waveforms and Vocal Tract Shapes from Speech Signals Based on ARX-LF Model
    Li, Yongwei
    Sakakibara, Ken-Ichi
    Akagi, Masato
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2020, 92 (08): : 831 - 838
  • [7] Simultaneous Estimation of Glottal Source Waveforms and Vocal Tract Shapes from Speech Signals Based on ARX-LF Model
    Yongwei Li
    Ken-Ichi Sakakibara
    Masato Akagi
    Journal of Signal Processing Systems, 2020, 92 : 831 - 838
  • [8] Voice Source and Vocal Tract Variations as Cues to Emotional States Perceived from Expressive Conversational Speech
    Mori, Hiroki
    Kasuya, Hideki
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 625 - +
  • [9] Measuring variations of voice source and vocal tract characteristics from Korean emotional voice
    Jo, Cheolwoo
    Wang, Jianglin
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, 2006, : 800 - +
  • [10] Extraction of vocal-tract system characteristics from speech signals
    Indian Inst of Technology, Madras, Spain
    IEEE Trans Speech Audio Process, 4 (313-327):