Should recognizers have ears?

被引:91
作者
Hermansky, H
机构
[1] Oregon Grad Inst Sci & Technol, Portland, OR USA
[2] Int Comp Sci Inst, Berkeley, CA 94704 USA
[3] Tech Univ, Brno, Czech Republic
基金
美国国家科学基金会;
关键词
auditory modeling; human-like processing; modulation frequency; automatic speech recognition;
D O I
10.1016/S0167-6393(98)00027-2
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, techniques motivated by human auditory perception are being applied in main-stream speech technology and there seems to be renewed interest in implementing more knowledge of human speech communication into a design of a speech recognizer. The paper discusses the author's experience with applying auditory knowledge to automatic recognition of speech. It advances the notion that the reason for applying of such a knowledge in speech engineering should be the ability of perception to suppress some parts of the irrelevant information in the speech message and argues against the blind implementation of scattered accidental knowledge which may be irrelevant to a speech recognition task. The following three properties of human speech perception are discussed in some detail: limited spectral resolution, use of information from about syllable-length segments, ability to ignore corrupted or irrelevant components of speech. It shows by referring to published works that selective use of auditory knowledge, optimized on and in some cases derived from real speech data, can be consistent with current stochastic approaches to ASR and could yield advantages in practical engineering applications. (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 76 条
  • [1] AIKAWA K, 1993, P INT C AC SIGN SPEE, P668
  • [2] How Do Humans Process and Recognize Speech?
    Allen, Jont B.
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04): : 567 - 577
  • [3] [Anonymous], 1997, P EUROSPEECH
  • [4] Arai T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P2490, DOI 10.1109/ICSLP.1996.607318
  • [5] ATTIAS H, 1998, ADV NEURAL INFORMATI, V10
  • [6] AVENDANO C, 1997, P 1997 WORKSH APPL S
  • [7] AVENDANO C, 1996, P INT C SPOK LANG PR
  • [8] 2-FORMANT MODELS OF VOWEL PERCEPTION - SHORTCOMINGS AND ENHANCEMENTS
    BLADON, A
    [J]. SPEECH COMMUNICATION, 1983, 2 (04) : 305 - 313
  • [9] Bourlard H, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P426
  • [10] BOURLARD H, 1996, P ARPA ASR WORKSH SP, P157