A Tutorial on Auditory Attention Identification Methods

被引:69
作者
Alickovic, Emina [1 ,2 ]
Lunner, Thomas [1 ,2 ,3 ,4 ]
Gustafsson, Fredrik [1 ]
Ljung, Lennart [1 ]
机构
[1] Linkoping Univ, Dept Elect Engn, Linkoping, Sweden
[2] Oticon AS, Erikshorm Res Ctr, Snekkersten, Denmark
[3] Tech Univ Denmark, Dept Hlth Technol, Hearing Syst, Lyngby, Denmark
[4] Linkoping Univ, Linnaeus Ctr HEAD, Swedish Inst Disabil Res, Linkoping, Sweden
基金
欧盟地平线“2020”; 瑞典研究理事会;
关键词
cocktail-party problem; auditory attention; linear models; stimulus reconstruction; canonical correlation anaysis (CCA); decoding; encoding; sparse representation; CANONICAL CORRELATION-ANALYSIS; COCKTAIL PARTY; SPEECH-INTELLIGIBILITY; CORTICAL TRACKING; ATTENDED SPEECH; TIME; NOISE; REGRESSION; SELECTION; SPEAKER;
D O I
10.3389/fnins.2019.00153
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
引用
收藏
页数:17
相关论文
共 98 条
[1]   Psychophysics and neuronal bases of sound localization in humans [J].
Ahveninen, Jyrki ;
Kopco, Norbert ;
Jaaskelainen, Iiro P. .
HEARING RESEARCH, 2014, 307 :86-97
[2]   Towards reconstructing intelligible speech from the human auditory cortex [J].
Akbari, Hassan ;
Khalighinejad, Bahar ;
Herrero, Jose L. ;
Mehta, Ashesh D. ;
Mesgarani, Nima .
SCIENTIFIC REPORTS, 2019, 9 (1)
[3]   Dynamic Estimation of the Auditory Temporal Response Function From MEG in Competing-Speaker Environments [J].
Akram, Sahar ;
Simon, Jonathan Z. ;
Babadi, Behtash .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2017, 64 (08) :1896-1905
[4]   Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling [J].
Akram, Sahar ;
Presacco, Alessandro ;
Simon, Jonathan Z. ;
Shamma, Shihab A. ;
Babadi, Behtash .
NEUROIMAGE, 2016, 124 :906-917
[5]   AUDITORY SCENE ANALYSIS: TALES FROM COGNITIVE NEUROSCIENCES [J].
Alain, Claude ;
Bernstein, Lori J. .
MUSIC PERCEPTION, 2015, 33 (01) :70-82
[6]  
Alickovic E., PLOS ONE
[7]  
Alickovic E, 2016, EUR SIGNAL PR CONF, P31, DOI 10.1109/EUSIPCO.2016.7760204
[8]  
[Anonymous], 2000, PRINCIPLES MULTIVARI
[9]  
[Anonymous], FOUND TRENDS MACH LE
[10]  
[Anonymous], 1988, MATLAB USERS GUIDE