Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded speech

被引:13
|
作者
Unoki, Masashi [1 ]
Zhu, Zhi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Sch Informat Sci, 1-1 Asahidai, Nomi 9231292, Japan
关键词
Temporal amplitude envelope; Modulation transfer function; Noise-vocoded speech; Vocal-emotion recognition; Speech intelligibility; Temporal modulation-spectral feature; VOCAL-EMOTION; SPEAKER INDIVIDUALITY; LISTENING DIFFICULTY; OBJECTIVE MEASURES; SPECTRAL FEATURES; RECOGNITION; FREQUENCY; INTELLIGIBILITY; REVERBERANT; INFORMATION;
D O I
10.1250/ast.41.233
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signals can be represented as a sum of amplitude-modulated frequency bands. This sum can also be regarded as a temporal amplitude envelope (TAE) with temporal fine structure. Our previous studies using noise-vocoded speech (NVS) showed that the TAE of speech plays an important role in the perception of linguistic information (speech intelligibility) as well as non-linguistic information (e.g., vocal-emotion recognition). It was found that the upper limit of the modulation frequency from 4 to 8 Hz on the TAE is important for speech intelligibility, while that from 8 to 16 Hz is important for vocal-emotion recognition. However, speech intelligibility generally dramatically degrades due to reverberation. The concept of the modulation transfer function (MTF) takes into account the relationship between the transfer function in an enclosure in terms of input and output TAEs and characteristics of the enclosure under reverberant conditions. This concept was introduced as a measure in room acoustics for assessing the effect of an enclosure on speech intelligibility. For this study, we conducted two experiments involving word intelligibility tests and vocal-emotion recognition with NVS under reverberant conditions to investigate the relationship between the contributions of the TAE of speech and MTF of reverberation to modulation perception of NVS. We also pointed out that the straightforward scheme, i.e., the relationship between the contributions of the static features (peak/slope) in the modulation spectrum (MS) of speech and MTF of reverberation, cannot consistently account for the auditory perception of both linguistic and non-linguistic information obtained from these perceptual data of NVS under reverberant conditions. We then developed a scheme in which the relationship between the contributions of the temporal MS features and MTF of reverberation to modulation perception can consistently account for these perceptual data of NVS.
引用
收藏
页码:233 / 244
页数:12
相关论文
共 30 条
  • [1] Frequency specificity of amplitude envelope patterns in noise-vocoded speech
    Ueda, Kazuo
    Araki, Tomoya
    Nakajima, Yoshitaka
    HEARING RESEARCH, 2018, 367 : 169 - 181
  • [2] Contributions of Temporal Modulation Cues in Temporal Amplitude Envelope of Speech to Urgency Perception
    Unoki, Masashi
    Kawamura, Miho
    Kobayashi, Maori
    Kidani, Shunsuke
    Li, Junfeng
    Akagi, Masato
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [3] Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech
    Zhu, Zhi
    Miyauchi, Ryota
    Araki, Yukiko
    Unoki, Masashi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2018, 39 (03) : 234 - 242
  • [4] Study on the perception of nonlinguistic information of noise-vocoded speech under noise and/or reverberation conditions
    Zhu, Zhi
    Kawamura, Miho
    Unoki, Masashi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2022, 43 (06) : 317 - 326
  • [5] Relative contributions of spectral and temporal resolutions to the perception of syllables, words, and sentences in noise-vocoded speech
    Tachibana, Ryosuke O.
    Sasaki, Yasunari
    Riquimaroux, Hiroshi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2013, 34 (04) : 263 - 270
  • [6] Comparing live to recorded speech in training the perception of spectrally shifted noise-vocoded speech
    Faulkner, Andrew
    Rosen, Stuart
    Green, Tim
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (04) : EL336 - EL342
  • [7] Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech
    Zhu, Zhi
    Miyauchih, Ryota
    Araki, Yukiko
    Unoki, Masashi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2018, 39 (06) : 379 - 386
  • [8] Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech
    Stilp, Christian E.
    JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2017, 18 (03): : 465 - 481
  • [9] Effects of envelope filter cutoff frequency on the intelligibility of Mandarin noise-vocoded speech in babble noise: Implications for cochlear implants
    Mai, Guangting
    Minett, James W.
    Wang, William S-Y.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3613 - 3617
  • [10] Perception of noise-vocoded sine-wave speech of Japanese pitch-accent words
    Shinohara, Yasuaki
    JASA EXPRESS LETTERS, 2022, 2 (08):