The cocktail-party problem revisited: early processing and selection of multi-talker speech

被引:266
作者
Bronkhorst, Adelbert W. [1 ,2 ]
机构
[1] TNO Human Factors, NL-3769 ZG Soesterberg, Netherlands
[2] Vrije Univ Amsterdam, Dept Cognit Psychol, NL-1081 BT Amsterdam, Netherlands
关键词
Attention; Auditory scene analysis; Cocktail-party problem; Informational masking; Speech perception; HUMAN AUDITORY-CORTEX; INTERAURAL TIME DIFFERENCES; RECEPTION THRESHOLD; FUNDAMENTAL-FREQUENCY; ENERGETIC MASKING; INFORMATIONAL MASKING; PERCEPTUAL SEPARATION; INTELLIGIBILITY INDEX; MISMATCH NEGATIVITY; ATTENTIONAL CAPTURE;
D O I
10.3758/s13414-015-0882-9
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and "unmasking" resulting from binaural listening. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Grouping-the segregation and streaming of sounds-represents a subsequent processing stage that interacts closely with attention. Sounds can be easily grouped-and subsequently selected-using primitive features such as spatial location and fundamental frequency. More complex processing is required when lexical, syntactic, or semantic information is used. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization.
引用
收藏
页码:1465 / 1487
页数:23
相关论文
共 147 条
  • [1] Task-modulated "what" and "where" pathways in human auditory cortex
    Ahveninen, Jyrki
    Jaaskelainen, Iiro P.
    Raij, Tommi
    Bonmassar, Giorgio
    Devore, Sasha
    Hamalainen, Matti
    Levanen, Sari
    Lin, Fa-Hsuan
    Sams, Mikko
    Shinn-Cunningham, Barbara G.
    Witzel, Thomas
    Belliveau, John W.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (39) : 14608 - 14613
  • [2] Attention-driven auditory cortex short-term plasticity helps segregate relevant sounds from noise
    Ahveninen, Jyrki
    Haemaelaeinen, Matti
    Jaaskelainen, Iiro P.
    Ahlfors, Seppo P.
    Huang, Samantha
    Lin, Fa-Hsuan
    Raij, Tommi
    Sams, Mikko
    Vasios, Christos E.
    Belliveau, John W.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (10) : 4182 - 4187
  • [3] What and "where" in the human auditory system
    Alain, C
    Arnott, SR
    Hevenor, S
    Graham, S
    Grady, CL
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (21) : 12301 - 12306
  • [4] How Do Humans Process and Recognize Speech?
    Allen, Jont B.
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04): : 567 - 577
  • [5] Speech intelligibility reduces over distance from an attended location: Evidence for an auditory spatial gradient of attention
    Allen, Kachina
    Alais, Daved
    Carlile, Simon
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2009, 71 (01) : 164 - 173
  • [6] [Anonymous], 1994, Auditory scene analysis: The perceptual organization of sound
  • [7] [Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
  • [8] [Anonymous], 2011, 6026816 IEC
  • [9] ANSI (American National Standards Institute), 1997, S351997 ANSI
  • [10] The effect of spatial separation on informational and energetic masking of speech
    Arbogast, TL
    Mason, CR
    Kidd, G
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (05) : 2086 - 2098