Noise-robust cortical tracking of attended speech in real-world acoustic scenes

被引:137
|
作者
Fuglsang, Soren Asp [1 ]
Dau, Torsten [1 ]
Hjortkjaer, Jens [1 ,2 ]
机构
[1] Tech Univ Denmark, Dept Elect Engn, Hearing Syst Grp, Bldg 352, DK-2800 Lyngby, Denmark
[2] Univ Copenhagen, Hosp Hvidovre, Ctr Funct & Diagnost Imaging & Res, Danish Res Ctr Magnet Resonance, Kettegaard Alle 30, DK-2650 Hvidovre, Denmark
关键词
Auditory attention; Speech; Cortical entrainment; EEG; Decoding; Acoustic simulations; Delta rhythms; Theta rhythms; HUMAN AUDITORY-CORTEX; NEURONAL OSCILLATIONS; SELECTIVE ATTENTION; PHASE PATTERNS; COCKTAIL PARTY; ENTRAINMENT; REPRESENTATION; RESPONSES; ENVELOPE; SOUND;
D O I
10.1016/j.neuroimage.2017.04.026
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream of a particular talker. Across the different listening environments, we found that the attended talker could be accurately decoded from single-trial EEG data irrespective of the different distortions in the acoustic input. For highly reverberant environments, speech envelopes reconstructed from neural responses to the distorted stimuli resembled the original clean signal more than the distorted input. With reverberant speech, we observed a late cortical response to the attended speech stream that encoded temporal modulations in the speech signal without its reverberant distortion. Single-trial attention decoding accuracies based on 40-50 s long blocks of data from 64 scalp electrodes were equally high (80-90% correct) in all considered listening environments and remained statistically significant using down to 10 scalp electrodes and short (<30-s) unaveraged EEG segments. In contrast to the robust decoding of the attended talker we found that decoding of the unattended talker deteriorated with the acoustic distortions. These results suggest that cortical activity tracks an attended speech signal in a way that is invariant to acoustic distortions encountered in real-life sound environments. Noise-robust attention decoding additionally suggests a potential utility of stimulus reconstruction techniques in attention-controlled brain-computer interfaces.
引用
收藏
页码:435 / 444
页数:10
相关论文
共 50 条
  • [1] Magnitude Replacement of Real and Imaginary Modulation Spectrum of Acoustic Spectrograms for Noise-Robust Speech Recognition
    Hsieh, Hsin-Ju
    Hung, Jeih-weih
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 328 - 329
  • [2] The Common-Neighbors Metric Is Noise-Robust and Reveals Substructures of Real-World Networks
    Cohen, Sarel
    Fischbeck, Philipp
    Friedrich, Tobias
    Krejca, Martin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 67 - 79
  • [3] Noise-robust speech triage
    Bartos, Anthony L.
    Cipr, Tomas
    Nelson, Douglas J.
    Schwarz, Petr
    Banowetz, John
    Jerabek, Ladislav
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (04): : 2313 - 2320
  • [4] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [5] Measurement of Speech in Noise Abilities in Laboratory and Real-World Noise
    Shukla, Bhanu
    Rao, B. Srinivasa
    Saxena, Udit
    Verma, Himanshu
    INDIAN JOURNAL OF OTOLOGY, 2018, 24 (02) : 109 - 113
  • [6] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [7] Are social interactions preferentially attended in real-world scenes? Evidence from change blindness
    Barzy, Mahsa
    Morgan, Rachel
    Cook, Richard
    Gray, Katie L. H.
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2023, 76 (10): : 2293 - 2302
  • [8] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [9] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
    van Dalen, R. C.
    Gales, M. J. F.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
  • [10] Covariance Modelling for Noise-Robust Speech Recognition
    van Dalen, R. C.
    Gales, M. J. F.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003