Unsupervised Training of a DNN-based Formant Tracker

被引：2

作者：

Lilley, Jason ^{[1
]}

Bunnell, H. Timothy ^{[1
]}

机构：

[1] Nemours Biomed Res, Wilmington, DE 19803 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech analysis; formant estimation; formant tracking; deep learning; acoustic models of speech; SPEECH;

D O I：

10.21437/Interspeech.2021-1690

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Phonetic analysis often requires reliable estimation of formants, but estimates provided by popular programs can be unreliable. Recently, Dissen et al. [1] described DNN- based formant trackers that produced more accurate frequency estimates than several others, but require manually-corrected formant data for training. Here we describe a novel unsupervised training method for corpus-based DNN formant parameter estimation and tracking with accuracy similar to [1]. Frame-wise spectral envelopes serve as the input. The output is estimates of the frequencies and bandwidths plus amplitude adjustments for a prespecified number of poles and zeros, hereafter referred to as "formant parameters." A custom loss measure based on the difference between the input envelope and one generated from the estimated formant parameters is calculated and backpropagated through the network to establish the gradients with respect to the formant parameters. The approach is similar to that of autoencoders, in that the model is trained to reproduce its input in order to discover latent features, in this case, the formant parameters. Our results demonstrate that a reliable formant tracker can be constructed for a speech corpus without the need for hand-corrected training data.

引用

页码：1189 / 1193

页数：5

共 50 条

[1] Unsupervised Domain Adaptation for DNN-based Automated Harvesting
Shkanaev, Aleksandr Yu
Sholomov, Dmitry L.
Nikolaev, Dmitry P.
TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
[2] Resisting DNN-Based Website Fingerprinting Attacks Enhanced by Adversarial Training
Qiao, Litao
Wu, Bang
Yin, Shuijun
Li, Heng
Yuan, Wei
Luo, Xiapu
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 5375 - 5386
[3] Deep Spread Multiplexing and Study of Training Methods for DNN-Based Encoder and Decoder
Kim, Minhoe
Lee, Woongsup
SENSORS, 2023, 23 (08)
[4] Exploiting foreign resources for DNN-based ASR
Motlicek, Petr
Imseng, David
Potard, Blaise
Garner, Philip N.
Himawan, Ivan
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 10
[5] Refining a deep learning-based formant tracker using linear prediction methods
Alku, Paavo
Kadiri, Sudarsana Reddy
Gowda, Dhananjaya
COMPUTER SPEECH AND LANGUAGE, 2023, 81
[6] DNN-based QoT Estimation Using Topological Inputs and Training with Synthetic-Physical Data
Mayer, Kayol S.
dos Santos, Luan C. M.
Pinto, Rossano P.
Dal Maso, Marcos P. A.
Rothenberg, Christian E.
Arantes, Dalton S.
Mello, Darli A. A.
2023 IEEE PHOTONICS CONFERENCE, IPC, 2023,
[7] DNN-Based PolSAR Image Classification on Noisy Labels
Ni, Jun
Xiang, Deliang
Lin, Zhiyuan
Lopez-Martinez, Carlos
Hu, Wei
Zhang, Fan
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 3697 - 3713
[8] DNN-based Approach to Detect and Classify Pathological Voice
Chuang, Zong-Ying
Yu, Xiao-Tong
Chen, Ji-Ying
Hsu, Yi-Te
Xu, Zhe-Zhuang
Wang, Chi-Te
Lin, Feng-Chuan
Fang, Shih-Hau
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5238 - 5241
[9] DNN-based Indoor Fingerprinting Localization with WiFi FTM
Eberechukwu, Paulson
Park, Hyunwoo
Laoudias, Christos
Horsmanheimo, Seppo
Kim, Sunwoo
2022 23RD IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2022), 2022, : 367 - 371
[10] Towards breaking DNN-based audio steganalysis with GAN
Wang, Jie
Wang, Rangding
Dong, Li
Yan, Diqun
Zhang, Xueyuan
Lin, Yuzhen
INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2021, 14 (04) : 371 - 383

← 1 2 3 4 5 →