Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986)a)

被引:11
作者
Whalen, D. H. [1 ,3 ]
Chen, Wei-Rong [1 ]
Shadle, Christine H. [1 ]
Fulop, Sean A. [2 ]
机构
[1] Haskins Labs Inc, New Haven, CT 06511 USA
[2] Calif State Univ Fresno, Dept Linguist, Fresno, CA 93740 USA
[3] City Univ New York, New York, NY 10016 USA
基金
美国国家卫生研究院;
关键词
REASSIGNED SPECTROGRAM; MEASUREMENT ERRORS; FREQUENCY; SPEECH; VOWELS; ACCURACY; TRACKING; INFANTS; APRAXIA; SPEAKER;
D O I
10.1121/10.0013410
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction " can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias. (C) 2022 Acoustical Society of America.
引用
收藏
页码:933 / 941
页数:9
相关论文
共 58 条
  • [11] F0-induced formant measurement errors result in biased variabilities
    Chen, Wei-Rong
    Whalen, D. H.
    Shadle, Christine H.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (05) : EL360 - EL366
  • [12] Chiba T., 1941, The vowel, its nature and structure
  • [13] Underwater Signal Analysis in the Modulation Spectrogram with Time-Frequency Reassignment Technique
    Cho, Hyunjin
    Kim, Wan Jin
    Hong, Wooyoung
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2019, E102A (11) : 1542 - 1544
  • [14] The hyperarticulation hypothesis of infant-directed speech
    Cristia, Alejandrina
    Seidl, Amanda
    [J]. JOURNAL OF CHILD LANGUAGE, 2014, 41 (04) : 913 - 934
  • [15] ConceFT: concentration of frequency and time via a multitapered synchrosqueezed transform
    Daubechies, Ingrid
    Wang, Yi
    Wu, Hau-tieng
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2016, 374 (2065):
  • [16] Vowel formant dispersion reflects severity of apraxia of speech
    den Ouden, Dirk-Bart
    Galkina, Elena
    Basilakos, Alexandra
    Fridriksson, Julius
    [J]. APHASIOLOGY, 2018, 32 (08) : 902 - 921
  • [17] Formant estimation and tracking: A deep learning approach
    Dissen, Yehoshua
    Goldberger, Jacob
    Keshet, Joseph
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (02) : 642 - 653
  • [18] Eskenazi M., 1997, The CMU kids corpus LDC97S63
  • [19] Fant G., 1960, Acoustic Theory of Speech Production
  • [20] ANALYSIS OF NASAL CONSONANTS
    FUJIMURA, O
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (12) : 1865 - &