The interplay between the auditory and visual modality for end-of-utterance detection

被引:22
作者
Barkhuysen, Pashiera [1 ]
Krahmer, Erniel [1 ]
Swerts, Marc [1 ]
机构
[1] Tilburg Univ, Fac Arts, NL-5000 LE Tilburg, Netherlands
关键词
D O I
10.1121/1.2816561
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The existence of auditory cues such as intonation, rhythm, and pausing that facilitate end-of-utterance detection is by now well established, It has been argued repeatedly that speakers, may also employ visual cues to indicate that they are at the end of their utterance. This raises at least two questions, which are addressed in the current paper. First, which modalities do speakers use for signalling finality and nonfinality, and second, how sensitive are observers to these signals. Our goal is to investigate the relative contribution of three different conditions to end-of-utterance detection: the two unimodal ones, vision only and audio only, and their bimodal combination. Speaker utterances were collected via a novel semicontrolled production experiment, in which participants provided lists of words in an interview setting. The data thus collected were used in two perception experiments, which systematically compared responses to unimodal (audio only and vision only) and bimodal (audio-visual) stimuli. Experiment I is a reaction time experiment, which revealed that humans are significantly quicker in end-of-utterance detection when confronted with bimodal or audio-only stimuli, than for vision-only stimuli. No significant differences in reaction times were found between the bimodal and audio-only condition, and therefore a second experiment was conducted. Experiment II is a classification experiment, and showed that participants perform significantly better in the bimodal condition than in the two unimodal ones. Both the first and the second experiment revealed interesting differences between speakers in the various conditions, which indicates that some speakers are more expressive in the visual and others in the auditory modality. (c) 2008 Acoustical Society of America.
引用
收藏
页码:354 / 365
页数:12
相关论文
共 45 条
  • [1] [Anonymous], 1986, Response times
  • [2] Argyle Michael, 1976, Gaze and mutual gaze
  • [3] WHY IS THATCHER INTERRUPTED SO OFTEN
    BEATTIE, GW
    CUTLER, A
    PEARSON, M
    [J]. NATURE, 1982, 300 (5894) : 744 - 747
  • [4] Visual recalibration of auditory speech identification: A McGurk aftereffect
    Bertelson, P
    Vroomen, J
    de Gelder, B
    [J]. PSYCHOLOGICAL SCIENCE, 2003, 14 (06) : 592 - 597
  • [5] Brebner J. M., 1980, Reaction times, P1
  • [6] Cues to upcoming Swedish prosodic boundaries: Subjective judgment studies and acoustic correlates
    Carlson, R
    Hirschberg, J
    Swerts, M
    [J]. SPEECH COMMUNICATION, 2005, 46 (3-4) : 326 - 333
  • [7] Who's next? The melodic marking of question vs. continuation in Dutch
    Caspers, J
    [J]. LANGUAGE AND SPEECH, 1998, 41 : 375 - 398
  • [8] Cassell J, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P106
  • [9] Couper-Kuhlen E., 1993, ENGLISH SPEECH RHYTH
  • [10] Projecting the end of a speaker's turn: A cognitive cornerstone of conversation
    De Ruiter, J. P.
    Mitterer, Holger
    Enfield, N. J.
    [J]. LANGUAGE, 2006, 82 (03) : 515 - 535