Gated Bilinear Networks for Vowel Formant Estimation

被引:0
|
作者
Dai, Wang [1 ]
Hua, Zheng [1 ]
Zhang, Jinsong [1 ]
Xie, Yanlu [1 ]
Lin, Binghuai [2 ]
机构
[1] Beijing Language & Culture Univ, Sch Informat Sci, Beijing, Peoples R China
[2] Tencent Technol Co Ltd, Smart Platform Prod Dept, Beijing, Peoples R China
来源
2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020) | 2020年
关键词
vowel formant estimation; Bilinear Network; Temporal Attention-Augmented Bilinear Network; gate mechanism; TRACKING; FREQUENCIES; PREDICTION;
D O I
10.1109/ialp51396.2020.9310481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Formant estimation from vowel segments, is very useful for linguistic purposes. Traditionally, formants are estimated using classical signal processing methods and statistical models. The averaged continuous extraction of formant frequency along the vowel segments is taken as the formant of vowels. New approaches using neural networks to predict vowel formants on an annotated database, where the input is the acoustic features and output is the mean formant frequency value. Recently, the Bilinear Network (BL) and Temporal Attention-Augmented Bilinear Network (TABL) have proven to be very effective on financial time-series analysis task, compared to recurrent networks and convolution networks. Similar to our work, we explored how to extend the structure of BL and learn from TABL to produce better short-term modeling capability for vowel formant estimation. More specifically, we proposed to replace the attention mechanism with sigmoid gate and use a learnable parameter to dynamically integrate the first linear transformation output, thus learning better representation of BL. Experiments on the vowels test set of public VTR corpus showed that our approach significantly surpassed DNN, CNN, BL and achieved slightly better performance than the poweful TABL model in terms of mean absolute error and mean absolute percent error rate on F1, F2, F3 and overall.
引用
收藏
页码:205 / 209
页数:5
相关论文
共 50 条
  • [21] Influence of Overpressure Breathing on Vowel Formant Frequencies
    Vojnovic, Milan
    Mijic, Miomir
    Sumarac Pavlovic, Dragana
    Vojnovic, Nebojsa
    ARCHIVES OF ACOUSTICS, 2021, 46 (03) : 177 - 181
  • [22] EFFECT OF SPEAKING RATE ON VOWEL FORMANT MOVEMENTS
    GAY, T
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (01): : 223 - 230
  • [23] Vowel formant dispersion as a measure of articulation proficiency
    Karlsson, Fredrik
    van Doorn, Jan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (04): : 2633 - 2641
  • [24] VOWEL FORMANT FREQUENCY CHARACTERISTICS OF ESOPHAGEAL SPEECH
    WEINBERG, B
    SISTY, NL
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 51 (01): : 91 - +
  • [25] Effect of Focus on Vowel Duration and Formant in Cantonese
    Wang, Maolin
    Yu, Han
    Xiong, Ziyu
    Huang, Haifeng
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 378 - 387
  • [26] Fusion of spatially separated vowel formant cues
    Takanen, Marko
    Raitio, Tuomo
    Santala, Olli
    Alku, Paavo
    Pulkki, Ville
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (06): : 4508 - 4517
  • [27] Effects of consonant environment on vowel formant patterns
    Hillenbrand, JM
    Clark, MJ
    Nearey, TM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (02): : 748 - 763
  • [28] Response patterns to vowel formant perturbations in children
    Cheung, Stephanie T.
    Thompson, Kristen
    Chen, Joyce L.
    Yunusova, Yana
    Beal, Deryk S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (04): : 2647 - 2654
  • [29] Fusion of spatially separated vowel formant cues
    Takanen, M. (marko.takanen@aalto.fi), 1600, Acoustical Society of America (134):
  • [30] FORMANT NORMALIZATION FOR SPEECH RECOGNITION AND VOWEL STUDIES
    HIERONYMUS, JL
    SPEECH COMMUNICATION, 1991, 10 (5-6) : 471 - 478