Unsupervised Training of a DNN-based Formant Tracker

被引：2

作者：

Lilley, Jason ^{[1
]}

Bunnell, H. Timothy ^{[1
]}

机构：

[1] Nemours Biomed Res, Wilmington, DE 19803 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;

D O I：

10.21437/Interspeech.2021-1690

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.

引用

页码：1189 / 1193

页数：5

共 50 条

[41] To what extent do DNN-based image classification models make unreliable inferences?
Yongqiang Tian
Shiqing Ma
Ming Wen
Yepang Liu
Shing-Chi Cheung
Xiangyu Zhang
Empirical Software Engineering, 2021, 26
[42] Development of DNN-based LIB State Diagnosis System Using Statistical Feature Extraction
Seo, Donghoon
Shin, Jongho
Journal of Institute of Control, Robotics and Systems, 2024, 30 (07) : 755 - 762
[43] ONLINE INTEGRATION OF DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
Matsui, Yutaro
Nakatani, Tomohiro
Delcroix, Marc
Kinoshita, Keisuke
Ito, Nobutaka
Araki, Shoko
Makino, Shoji
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 71 - 75
[44] Initialization, training, and context-dependency in HMM-based formant tracking
Toledano, DT
Villardebó, JG
Gómez, LH
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 511 - 523
[45] Efficient DNN training based on backpropagation parallelization
Xiao, Danyang
Yang, Chengang
Wu, Weigang
COMPUTING, 2022, 104 (11) : 2431 - 2451
[46] Efficient DNN training based on backpropagation parallelization
Danyang Xiao
Chengang Yang
Weigang Wu
Computing, 2022, 104 : 2431 - 2451
[47] DATA-DRIVEN DESIGN OF PERFECT RECONSTRUCTION FILTERBANK FOR DNN-BASED SOUND SOURCE ENHANCEMENT
Takeuchi, Daiki
Yatabe, Kohei
Koizumi, Yuma
Oikawa, Yasuhiro
Harada, Noboru
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 596 - 600
[48] Safe2walk4blind: DNN-based walking assistance system for the blind
Ban J.-H.
Lee T.-M.
Yoo J.
Journal of Institute of Control, Robotics and Systems, 2019, 25 (06) : 565 - 571
[49] DNN-based Implementation of Data-Driven Iterative Learning Control for Unknown System Dynamics
Li, Junkang
Fang, Yong
Ge, Yu
Wu, Yuzhou
PROCEEDINGS OF 2020 IEEE 9TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS'20), 2020, : 1037 - 1042
[50] Construction and validation of a DNN-based biological age and its influencing factors in the China Kadoorie Biobank
Huang, Yushu
Da, Lijuan
Dong, Yue
Li, Zihan
Liu, Yuan
Li, Zilin
Wu, Xifeng
Li, Wenyuan
GEROSCIENCE, 2025,

← 1 2 3 4 5 →