Sequential organization of speech in computational auditory scene analysis

被引:16
|
作者
Shao, Yang [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
关键词
Sequential organization; Computational auditory scene analysis; Speaker quantization; Binary time-frequency mask; MODEL; SEGREGATION; TRACKING;
D O I
10.1016/j.specom.2009.02.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A human listener has the ability to follow a speaker's voice over time in the presence of other talkers and non-speech interference. This paper proposes a general system for sequential organization of speech based on speaker models. By training a general background model, the proposed system is shown to function well with both interfering talkers and non-speech intrusions. To deal with situations where prior information about specific speakers is not available, a speaker quantization method is employed to extract representative models from a large speaker space and obtained generic models are used to perform sequential grouping. Our systematic evaluations show that grouping performance using generic models is only moderately lower than the performance level achieved with known speaker models. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:657 / 667
页数:11
相关论文
共 50 条
  • [41] Auditory scene analysis: Examining the role of nonlinguistic auditory processing in speech perception
    Sussman, ES
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 5 - 12
  • [42] Sequential auditory scene analysis is preserved in normal aging adults
    Snyder, Joel S.
    Alain, Claude
    CEREBRAL CORTEX, 2007, 17 (03) : 501 - 512
  • [43] Robust speaker identification using auditory features and computational auditory scene analysis
    Shao, Yang
    Wang, DeLiang
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1589 - 1592
  • [44] SNR-Based Mask Compensation for Computational Auditory Scene Analysis Applied to Speech Recognition in a Car Environment
    Park, Ji Hun
    Kim, Seon Man
    Yoon, Jae Sam
    Kim, Hong Kook
    Lee, Sung Joo
    Lee, Yunkeun
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 725 - +
  • [45] On ideal binary mask as the computational goal of auditory scene analysis
    Wang, DL
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 181 - 197
  • [46] COMPUTATIONAL AUDITORY SCENE ANALYSIS - EXPLOITING PRINCIPLES OF PERCEIVED CONTINUITY
    COOKE, MP
    BROWN, GJ
    SPEECH COMMUNICATION, 1993, 13 (3-4) : 391 - 399
  • [47] COMPUTATIONAL AUDITORY SCENE ANALYSIS - LISTENING TO SEVERAL THINGS AT ONCE
    COOKE, M
    BROWN, GJ
    CRAWFORD, M
    GREEN, P
    ENDEAVOUR, 1993, 17 (04) : 186 - 190
  • [48] A Computational Approach to the Dynamic Aspects of Primitive Auditory Scene Analysis
    Kashino, Makio
    Adachi, Eisuke
    Hirose, Haruto
    BASIC ASPECTS OF HEARING: PHYSIOLOGY AND PERCEPTION, 2013, 787 : 519 - 526
  • [49] Computational auditory scene analysis and its application to robot audition
    Okuno, Hiroshi G.
    Nakadai, Kazuhiro
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 125 - +
  • [50] Computational auditory scene analysis and its application to robot audition
    Okuno, HG
    Ogata, T
    Komatani, K
    Nakadai, K
    INTERNATIONAL CONFERENCE ON INFORMATICS RESEARCH FOR DEVELOPMENT OF KNOWLEDGE SOCIETY INFRASTRUCTURE, PROCEEDINGS, 2004, : 73 - 80