Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks

被引:5
|
作者
Kato, Akihiro [1 ]
Kinnunen, Tomi H. [2 ]
机构
[1] Ricoh Co Ltd, Ricoh Inst Technol, Ebina, Kanagawa 2430460, Japan
[2] Univ Eastern Finland, Sch Comp, FI-80101 Joensuu, Finland
基金
芬兰科学院;
关键词
Estimation; Hidden Markov models; Speech processing; Noise robustness; Task analysis; Recurrent neural networks; Fundamental frequency; F0; pitch; waveform-to-sinusoid regression; regression model; recurrent neural networks; FUNDAMENTAL-FREQUENCY; MULTIPITCH TRACKING; SPEECH; PERFORMANCE; PREDICTION; ALGORITHM; LSTM;
D O I
10.1109/TASLP.2019.2945489
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The fundamental frequency (F0) in a speech signal, which corresponds to pitch, is one of the key features involved in a variety of speech processing tasks. Therefore, accurate F0 estimation has remained an important problem to be solved over decades. However, this problem is difficult, especially in low signal-to-noise ratio (SNR) conditions with unknown noise. In this work, we propose new approaches to noise-robust F0 estimation using recurrent neural networks (RNNs). Recent F0 estimation studies exploit deep neural networks (DNNs), including RNNs, to classify acoustic features into quantized frequency states. In contrast to these classification approaches, we put forward a regression method for F0 tracking, which is accomplished with RNNs. To this end, we propose two variants. Our first model predicts the (scalar) F0 value directly from a spectrum, while our second model predicts a target sinusoidal waveform (with the desired F0) from the raw speech waveform. Our experiments with the pitch tracking database from Graz University of Technology (PTDB-TUG), contaminated by additive noise (NOISEX-92), demonstrate the improvement of the proposed approaches in terms of the gross pitch error (GPE) and fine pitch error (FPE) rates by more than 35 at SNRs between -10dB and 10dB against a well-known, noise-robust F0 tracker, PEFAC. Furthermore, our methods outperform state-of-the-art neural network-based approaches by more than 15 in terms of both the FPE and GPE rates over the abovementioned SNR range.
引用
收藏
页码:2336 / 2349
页数:14
相关论文
共 50 条
  • [31] Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks
    Valentini-Botinhao, Cassia
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 352 - 356
  • [32] Robust Learning of Recurrent Neural Networks in Presence of Exogenous Noise
    Amini, Arash
    Liu, Guangyi
    Motee, Nader
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 783 - 788
  • [33] Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
    Hua, Kanru
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 337 - 341
  • [34] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Zhaojie Luo
    Jinhui Chen
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [35] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [36] Robust stability and robust periodicity of delayed recurrent neural networks with noise disturbance
    Li, Chunguang
    Liao, Xiaofeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2006, 53 (10) : 2265 - 2273
  • [37] Deep Recurrent Neural Networks for Ionospheric Variations Estimation Using GNSS Measurements
    Kaselimi, Maria
    Voulodimos, Athanasios
    Doulamis, Nikolaos
    Doulamis, Anastasios
    Delikaraoglou, Demitris
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [38] Fitting deep neural networks into the statistical regression modelling setting
    Ha, Il Do
    Burke, Kevin
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2024,
  • [39] Robust nonparametric regression based on deep ReLU neural networks
    Chen, Juntong
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 233
  • [40] Robust Multipitch Estimation of Piano Sounds Using Deep Spiking Neural Networks
    Qian, Hanxiao
    Gu, Pengjie
    Yan, Rui
    Tang, Huajin
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2335 - 2341