Unsupervised Training of a DNN-based Formant Tracker

被引:2
|
作者
Lilley, Jason [1 ]
Bunnell, H. Timothy [1 ]
机构
[1] Nemours Biomed Res, Wilmington, DE 19803 USA
来源
INTERSPEECH 2021 | 2021年
关键词
speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;
D O I
10.21437/Interspeech.2021-1690
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.
引用
收藏
页码:1189 / 1193
页数:5
相关论文
共 50 条
  • [21] AUTOREGRESSIVE PARAMETER ESTIMATION WITH DNN-BASED PRE-PROCESSING
    Cui, Zihao
    Bao, Changchun
    Nielsen, Jesper Kjoer
    Christensen, Mads Groesboll
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6759 - 6763
  • [22] A DNN-Based Object Detection System on Mobile Cloud Computing
    Qi, Buren
    Wu, Mengfei
    Zhang, Lin
    2017 17TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2017,
  • [23] A Study on DNN-Based Practical Model for Predicting Spot Color
    Moon, Jaekyeong
    Yang, Geonhee
    Tae, Hyunchul
    APPLIED SCIENCES-BASEL, 2023, 13 (24):
  • [24] DNN-Based Prediction Model for Spatio-Temporal Data
    Zhang, Junbo
    Zheng, Yu
    Qi, Dekang
    Li, Ruiyuan
    Yi, Xiuwen
    24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
  • [25] A COMPARATIVE STUDY OF DNN-BASED MODELS FOR BLIND IMAGE QUALITY PREDICTION
    Yang, Xiaohan
    Li, Fan
    Liu, Hantao
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1019 - 1023
  • [26] DNN-Based Velocity Estimator Using Inertial Sensor for Robot Localization
    Kim, Chul-hong
    Cho, Dong-il Dan
    IFAC PAPERSONLINE, 2023, 56 (02): : 5027 - 5032
  • [27] DNN-BASED ENSEMBLE SINGING VOICE SYNTHESIS WITH INTERACTIONS BETWEEN SINGERS
    Hyodo, Hiroaki
    Takamichi, Shinnosuke
    Nakamura, Tomohiko
    Koguchi, Junya
    Saruwatari, Hiroshi
    2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 660 - 667
  • [28] Online Phase Reconstruction via DNN-Based Phase Differences Estimation
    Masuyama, Yoshiki
    Yatabe, Kohei
    Nagatomo, Kento
    Oikawa, Yasuhiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 163 - 176
  • [29] An Advanced Two-Step DNN-Based Framework for Arrhythmia Detection
    He, Jinyuan
    Rong, Jia
    Sun, Le
    Wang, Hua
    Zhang, Yanchun
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 422 - 434
  • [30] IMPROVED DNN-BASED SEGMENTATION FOR MULTI-GENRE BROADCAST AUDIO
    Wang, L.
    Zhang, C.
    Woodland, P. C.
    Gales, M. J. F.
    Karanasou, P.
    Lanchantin, P.
    Liu, X.
    Qian, Y.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5700 - 5704