Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

被引:37
作者
Ceolini, Enea [1 ,2 ]
Hjortkjaer, Jens [3 ,4 ]
De Wong, Daniel [5 ,6 ]
O'Sullivan, James [7 ,8 ]
Raghavan, Vinay S. [7 ,8 ]
Herrero, Jose [9 ]
Mehta, Ashesh D. [9 ]
Liu, Shih-Chii [1 ,2 ]
Mesgarani, Nima [7 ,8 ]
机构
[1] Univ Zurich, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Inst Neuroinformat, Zurich, Switzerland
[3] Danmarks Tekniske Univ DTU, Dept Hlth Technol, Lyngby, Denmark
[4] Copenhagen Univ Hosp Hvidovre, Danish Res Ctr Magnet Resonance, Hvidovre, Denmark
[5] CNRS, Lab Syst Perceptifs, UMR 8248, Paris, France
[6] PSL Res Univ, Ecole Normale Super, Dept Etud Cognit, Paris, France
[7] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
[8] Columbia Univ, Mortimer B Zuckerman Mind Brain Behav Inst, New York, NY 10027 USA
[9] Hofstra Northwell Sch Med, Dept Neurosurg, Manhasset, NY USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
EEG; Neuro-steered; Cognitive control; Speech separation; Deep learning; Hearing aid; AUDITORY ATTENTION; ATTENDED SPEAKER; LCMV BEAMFORMER; HEARING-LOSS; TRACKING; ALGORITHMS; TIME;
D O I
10.1016/j.neuroimage.2020.117282
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of "neuro-steered" hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)(1) in which the information about the attended speech, as decoded from the subject's brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices.
引用
收藏
页数:12
相关论文
共 52 条
  • [1] Towards reconstructing intelligible speech from the human auditory cortex
    Akbari, Hassan
    Khalighinejad, Bahar
    Herrero, Jose L.
    Mehta, Ashesh D.
    Mesgarani, Nima
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [2] Aroudi A, 2019, INT CONF ACOUST SPEE, P406, DOI 10.1109/ICASSP.2019.8683635
  • [3] Braun S, 2017, EUR SIGNAL PR CONF, P548, DOI 10.23919/EUSIPCO.2017.8081267
  • [4] AUDITORY STREAMING AND BUILDING OF TIMBER
    BREGMAN, AS
    PINKER, S
    [J]. CANADIAN JOURNAL OF PSYCHOLOGY-REVUE CANADIENNE DE PSYCHOLOGIE, 1978, 32 (01): : 19 - 31
  • [5] Ceolini E., 2019, IEEE INT WORKS MACH, P1
  • [6] Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution
    Ceolini, Enea
    Kiselev, Ilya
    Liu, Shih-Chii
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1428 - 1439
  • [7] Chen Z, 2017, INT CONF ACOUST SPEE, P246, DOI 10.1109/ICASSP.2017.7952155
  • [9] Technology for hearing loss - as We Know it, and as We Dream it
    Clark, Jackie L.
    Swanepoel, De Wet
    [J]. DISABILITY AND REHABILITATION-ASSISTIVE TECHNOLOGY, 2014, 9 (05) : 408 - 413
  • [10] Conn PM., 2006, Handbook of Models of Human Aging