Unsupervised Training of a DNN-based Formant Tracker

被引:2
|
作者
Lilley, Jason [1 ]
Bunnell, H. Timothy [1 ]
机构
[1] Nemours Biomed Res, Wilmington, DE 19803 USA
来源
INTERSPEECH 2021 | 2021年
关键词
speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;
D O I
10.21437/Interspeech.2021-1690
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.
引用
收藏
页码:1189 / 1193
页数:5
相关论文
共 50 条
  • [41] To what extent do DNN-based image classification models make unreliable inferences?
    Yongqiang Tian
    Shiqing Ma
    Ming Wen
    Yepang Liu
    Shing-Chi Cheung
    Xiangyu Zhang
    Empirical Software Engineering, 2021, 26
  • [42] Development of DNN-based LIB State Diagnosis System Using Statistical Feature Extraction
    Seo, Donghoon
    Shin, Jongho
    Journal of Institute of Control, Robotics and Systems, 2024, 30 (07) : 755 - 762
  • [43] ONLINE INTEGRATION OF DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Matsui, Yutaro
    Nakatani, Tomohiro
    Delcroix, Marc
    Kinoshita, Keisuke
    Ito, Nobutaka
    Araki, Shoko
    Makino, Shoji
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 71 - 75
  • [44] Initialization, training, and context-dependency in HMM-based formant tracking
    Toledano, DT
    Villardebó, JG
    Gómez, LH
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 511 - 523
  • [45] Efficient DNN training based on backpropagation parallelization
    Xiao, Danyang
    Yang, Chengang
    Wu, Weigang
    COMPUTING, 2022, 104 (11) : 2431 - 2451
  • [46] Efficient DNN training based on backpropagation parallelization
    Danyang Xiao
    Chengang Yang
    Weigang Wu
    Computing, 2022, 104 : 2431 - 2451
  • [47] DATA-DRIVEN DESIGN OF PERFECT RECONSTRUCTION FILTERBANK FOR DNN-BASED SOUND SOURCE ENHANCEMENT
    Takeuchi, Daiki
    Yatabe, Kohei
    Koizumi, Yuma
    Oikawa, Yasuhiro
    Harada, Noboru
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 596 - 600
  • [48] Safe2walk4blind: DNN-based walking assistance system for the blind
    Ban J.-H.
    Lee T.-M.
    Yoo J.
    Journal of Institute of Control, Robotics and Systems, 2019, 25 (06) : 565 - 571
  • [49] DNN-based Implementation of Data-Driven Iterative Learning Control for Unknown System Dynamics
    Li, Junkang
    Fang, Yong
    Ge, Yu
    Wu, Yuzhou
    PROCEEDINGS OF 2020 IEEE 9TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS'20), 2020, : 1037 - 1042
  • [50] Construction and validation of a DNN-based biological age and its influencing factors in the China Kadoorie Biobank
    Huang, Yushu
    Da, Lijuan
    Dong, Yue
    Li, Zihan
    Liu, Yuan
    Li, Zilin
    Wu, Xifeng
    Li, Wenyuan
    GEROSCIENCE, 2025,