Unsupervised Training of a DNN-based Formant Tracker

被引：2

作者：

Lilley, Jason ^{[1
]}

Bunnell, H. Timothy ^{[1
]}

机构：

[1] Nemours Biomed Res, Wilmington, DE 19803 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;

D O I：

10.21437/Interspeech.2021-1690

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.

引用

页码：1189 / 1193

页数：5

共 50 条

[21] AUTOREGRESSIVE PARAMETER ESTIMATION WITH DNN-BASED PRE-PROCESSING
Cui, Zihao
Bao, Changchun
Nielsen, Jesper Kjoer
Christensen, Mads Groesboll
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6759 - 6763
[22] A DNN-Based Object Detection System on Mobile Cloud Computing
Qi, Buren
Wu, Mengfei
Zhang, Lin
2017 17TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2017,
[23] A Study on DNN-Based Practical Model for Predicting Spot Color
Moon, Jaekyeong
Yang, Geonhee
Tae, Hyunchul
APPLIED SCIENCES-BASEL, 2023, 13 (24):
[24] DNN-Based Prediction Model for Spatio-Temporal Data
Zhang, Junbo
Zheng, Yu
Qi, Dekang
Li, Ruiyuan
Yi, Xiuwen
24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
[25] A COMPARATIVE STUDY OF DNN-BASED MODELS FOR BLIND IMAGE QUALITY PREDICTION
Yang, Xiaohan
Li, Fan
Liu, Hantao
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1019 - 1023
[26] DNN-Based Velocity Estimator Using Inertial Sensor for Robot Localization
Kim, Chul-hong
Cho, Dong-il Dan
IFAC PAPERSONLINE, 2023, 56 (02): : 5027 - 5032
[27] DNN-BASED ENSEMBLE SINGING VOICE SYNTHESIS WITH INTERACTIONS BETWEEN SINGERS
Hyodo, Hiroaki
Takamichi, Shinnosuke
Nakamura, Tomohiko
Koguchi, Junya
Saruwatari, Hiroshi
2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 660 - 667
[28] Online Phase Reconstruction via DNN-Based Phase Differences Estimation
Masuyama, Yoshiki
Yatabe, Kohei
Nagatomo, Kento
Oikawa, Yasuhiro
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 163 - 176
[29] An Advanced Two-Step DNN-Based Framework for Arrhythmia Detection
He, Jinyuan
Rong, Jia
Sun, Le
Wang, Hua
Zhang, Yanchun
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 422 - 434
[30] IMPROVED DNN-BASED SEGMENTATION FOR MULTI-GENRE BROADCAST AUDIO
Wang, L.
Zhang, C.
Woodland, P. C.
Gales, M. J. F.
Karanasou, P.
Lanchantin, P.
Liu, X.
Qian, Y.
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5700 - 5704

← 1 2 3 4 5 →