Gated Bilinear Networks for Vowel Formant Estimation

被引：0

作者：

Dai, Wang ^{[1
]}

Hua, Zheng ^{[1
]}

Zhang, Jinsong ^{[1
]}

Xie, Yanlu ^{[1
]}

Lin, Binghuai ^{[2
]}

机构：

[1] Beijing Language & Culture Univ, Sch Informat Sci, Beijing, Peoples R China

[2] Tencent Technol Co Ltd, Smart Platform Prod Dept, Beijing, Peoples R China

来源：

2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020) | 2020年

关键词：

vowel formant estimation; Bilinear Network; Temporal Attention-Augmented Bilinear Network; gate mechanism; TRACKING; FREQUENCIES; PREDICTION;

D O I：

10.1109/ialp51396.2020.9310481

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Formant estimation from vowel segments, is very useful for linguistic purposes. Traditionally, formants are estimated using classical signal processing methods and statistical models. The averaged continuous extraction of formant frequency along the vowel segments is taken as the formant of vowels. New approaches using neural networks to predict vowel formants on an annotated database, where the input is the acoustic features and output is the mean formant frequency value. Recently, the Bilinear Network (BL) and Temporal Attention-Augmented Bilinear Network (TABL) have proven to be very effective on financial time-series analysis task, compared to recurrent networks and convolution networks. Similar to our work, we explored how to extend the structure of BL and learn from TABL to produce better short-term modeling capability for vowel formant estimation. More specifically, we proposed to replace the attention mechanism with sigmoid gate and use a learnable parameter to dynamically integrate the first linear transformation output, thus learning better representation of BL. Experiments on the vowels test set of public VTR corpus showed that our approach significantly surpassed DNN, CNN, BL and achieved slightly better performance than the poweful TABL model in terms of mean absolute error and mean absolute percent error rate on F1, F2, F3 and overall.

引用

页码：205 / 209

页数：5

共 50 条

[31] VOWEL DISTORTION IN TRAUMATIC DYSARTHRIA - A FORMANT STUDY
ZIEGLER, W
VONCRAMON, D
PHONETICA, 1983, 40 (01) : 63 - 78
[32] Influence of Overpressure Breathing on Vowel Formant Frequencies
Vojnovic, Milan
Mijic, Miomir
Sumarac Pavlovic, Dragana
Vojnovic, Nebojsa
ARCHIVES OF ACOUSTICS, 2021, 46 (01) : 177 - 181
[33] SINGLE-FORMANT CONTRAST IN VOWEL IDENTIFICATION
CROWDER, RG
REPP, BH
BULLETIN OF THE PSYCHONOMIC SOCIETY, 1983, 21 (05) : 367 - 367
[34] Formant Vowel Structure Tracking by Goertzel Algorithm
Tomas, Bozo
Obad, Marko
ICDT: 2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL TELECOMMUNICATIONS, 2009, : 102 - +
[35] PIECEWISE - PLANAR REPRESENTATION OF VOWEL FORMANT FREQUENCIES
BROAD, DJ
WAKITA, H
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 62 (06): : 1467 - 1473
[36] DIFFERENCE LIMENS FOR FORMANT PATTERNS OF VOWEL SOUNDS
HAWKS, JW
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (02): : 1074 - 1084
[37] The importance of vowel formant frequencies and proximity in vowel space to the perception of foreign accent
Chan, Kit Ying
Hall, Michael D.
JOURNAL OF PHONETICS, 2019, 77
[38] A METHODOLOGY FOR MODELING VOWEL FORMANT CONTOURS IN CVC CONTEXT
BROAD, DJ
CLERMONT, F
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 (01): : 155 - 165
[39] Rollover effect of signal level on vowel formant discrimination
Liu, Chang
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : EL52 - EL58
[40] The Formant Bandwidth as a Measure of Vowel Intelligibility in Dysphonic Speech
Ishikawa, Keiko
Webster, JosseMia
JOURNAL OF VOICE, 2023, 37 (02) : 173 - 177

← 1 2 3 4 5 →