Behavioral Account of Attended Stream Enhances Neural Tracking

被引:3
作者
Huet, Moira-Phoebe [1 ,2 ]
Micheyl, Christophe [3 ]
Parizet, Etienne [1 ]
Gaudrain, Etienne [2 ,4 ]
机构
[1] Univ Lyon, Inst Natl Sci Appl Lyon, Lab Vibrat Acoust, Villeurbanne, France
[2] INSERM, CNRS, Auditory Cognit & Psychoacoust Team, Lyon Neurosci Res Ctr,UMR 5292,U1028, Lyon, France
[3] Starkey France Sarl, Creteil, France
[4] Univ Groningen, Univ Med Ctr Groningen, Dept Otorhinolaryngol, Groningen, Netherlands
关键词
neural tracking; attentional switches; temporal response function (TRF); speech-on-speech; vocal cues; VOCAL-TRACT LENGTH; AUDITORY ATTENTION; SPEECH; SPEAKER; NOISE; MEG; INTELLIGIBILITY; ENVIRONMENT; PERCEPTION; EXTRACTION;
D O I
10.3389/fnins.2021.674112
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
During the past decade, several studies have identified electroencephalographic (EEG) correlates of selective auditory attention to speech. In these studies, typically, listeners are instructed to focus on one of two concurrent speech streams (the "target"), while ignoring the other (the "masker"). EEG signals are recorded while participants are performing this task, and subsequently analyzed to recover the attended stream. An assumption often made in these studies is that the participant's attention can remain focused on the target throughout the test. To check this assumption, and assess when a participant's attention in a concurrent speech listening task was directed toward the target, the masker, or neither, we designed a behavioral listen-then-recall task (the Long-SWoRD test). After listening to two simultaneous short stories, participants had to identify keywords from the target story, randomly interspersed among words from the masker story and words from neither story, on a computer screen. To modulate task difficulty, and hence, the likelihood of attentional switches, masker stories were originally uttered by the same talker as the target stories. The masker voice parameters were then manipulated to parametrically control the similarity of the two streams, from clearly dissimilar to almost identical. While participants listened to the stories, EEG signals were measured and subsequently, analyzed using a temporal response function (TRF) model to reconstruct the speech stimuli. Responses in the behavioral recall task were used to infer, retrospectively, when attention was directed toward the target, the masker, or neither. During the model-training phase, the results of these behavioral-data-driven inferences were used as inputs to the model in addition to the EEG signals, to determine if this additional information would improve stimulus reconstruction accuracy, relative to performance of models trained under the assumption that the listener's attention was unwaveringly focused on the target. Results from 21 participants show that information regarding the actual - as opposed to, assumed - attentional focus can be used advantageously during model training, to enhance subsequent (test phase) accuracy of auditory stimulus-reconstruction based on EEG signals. This is the case, especially, in challenging listening situations, where the participants' attention is less likely to remain focused entirely on the target talker. In situations where the two competing voices are clearly distinct and easily separated perceptually, the assumption that listeners are able to stay focused on the target is reasonable. The behavioral recall protocol introduced here provides experimenters with a means to behaviorally track fluctuations in auditory selective attention, including, in combined behavioral/neurophysiological studies.
引用
收藏
页数:13
相关论文
共 44 条
  • [1] Dynamic Estimation of the Auditory Temporal Response Function From MEG in Competing-Speaker Environments
    Akram, Sahar
    Simon, Jonathan Z.
    Babadi, Behtash
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2017, 64 (08) : 1896 - 1905
  • [2] Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling
    Akram, Sahar
    Presacco, Alessandro
    Simon, Jonathan Z.
    Shamma, Shihab A.
    Babadi, Behtash
    [J]. NEUROIMAGE, 2016, 124 : 906 - 917
  • [3] [Anonymous], 2013, TECHNOLOGY BINAURAL, DOI DOI 10.1007/978-3-642-37762-4_2
  • [4] INCREASING THE INTELLIGIBILITY OF SPEECH THROUGH MULTIPLE PHONEMIC RESTORATIONS
    BASHFORD, JA
    RIENER, KR
    WARREN, RM
    [J]. PERCEPTION & PSYCHOPHYSICS, 1992, 51 (03): : 211 - 217
  • [5] Musician advantage for speech-on-speech perception
    Baskent, Deniz
    Gaudrain, Etienne
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (03) : EL51 - EL56
  • [6] Bates D., 2014, arXiv, DOI DOI 10.18637/JSS.V067.I01
  • [7] Where is the cocktail party? Decoding locations of attended and unattended moving sound sources using EEG
    Bednar, Adam
    Lalor, Edmund C.
    [J]. NEUROIMAGE, 2020, 205
  • [8] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [9] Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario
    Biesmans, Wouter
    Das, Neetha
    Francart, Tom
    Bertrand, Alexander
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (05) : 402 - 412
  • [10] The cocktail-party problem revisited: early processing and selection of multi-talker speech
    Bronkhorst, Adelbert W.
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2015, 77 (05) : 1465 - 1487