Speech segregation based on pitch tracking and amplitude modulation

被引：61

作者：

Hu, GN ^{[1
]}

Wang, DL ^{[1
]}

机构：

[1] Ohio State Univ, Biophys Program, Columbus, OH 43210 USA

来源：

PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS | 2001年

关键词：

D O I：

10.1109/ASPAA.2001.969547

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. In this paper, we extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.

引用

页码：79 / 82

页数：4

共 10 条

[1]

Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.1121/1.408434, DOI 10.7551/MITPRESS/1486.001.0001]

[2] COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].

BROWN, GJ ;

COOKE, M .

COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336

[3] Robust automatic speech recognition with missing and unreliable acoustic data [J].

Cooke, M ;

Green, P ;

Josifovski, L ;

Vizinho, A .

SPEECH COMMUNICATION, 2001, 34 (03) :267-285

[4]

COOKE M, 1993, MODELING AUDITORY PR

[5]

ELLIS D, 1996, THESIS MIT

[6] BLIND SEPARATION OF SOURCES .1. AN ADAPTIVE ALGORITHM BASED ON NEUROMIMETIC ARCHITECTURE [J].

JUTTEN, C ;

HERAULT, J .

SIGNAL PROCESSING, 1991, 24 (01) :1-10

[7] Blind source separation of more sources than mixtures using overcomplete representations [J].

Lee, TW ;

Lewicki, MS ;

Girolami, M ;

Sejnowski, TJ .

IEEE SIGNAL PROCESSING LETTERS, 1999, 6 (04) :87-90

[8]

Moore B.C.J., 1997, INTRO PSYCHOL HEARIN

[9] A comparison of auditory and blind separation techniques for speech segregation [J].

van der Kouwe, AJW ;

Wang, DL ;

Brown, GJ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :189-195

[10] Separation of speech from interfering sounds based on oscillatory correlation [J].

Wang, DLL ;

Brown, GJ .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (03) :684-697

← 1 →