MUSIC MODELS FOR MUSIC-SPEECH SEPARATION

被引:0
|
作者
Hughes, Thad
Kristjansson, Trausti
机构
来源
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年
关键词
ASR; noise robustness; noise reduction; non-stationary noise; music;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio prior to speech. We show over 8% relative improvement in WER at 10 dB SNR for a real world Voice Search ASR system. We investigate the relationship between ASR accuracy and the amount of music background used as prologue and the the size of music models. Our study shows that performance peaks when using a music prologue of around 6 seconds to train the music model. We hypothesize that this is due to the dynamic nature of music and the structure of popular music. Adding more history beyond a certain point does not improve results. Additionally, we show moderately sized 8-component music GMM models suffice to model this amount of music prologue.
引用
收藏
页码:4917 / 4920
页数:4
相关论文
共 50 条
  • [1] MUSIC MODELS FOR MUSIC-SPEECH SEPARATION
    Hughes, Thad
    Kristjansson, Trausti
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4917 - 4920
  • [2] Music Component Characterization in the Music-Speech Mixture for Female Singing Tracks
    Sharma, Shivam
    Mittal, Vinay Kumar
    2017 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND NETWORKS (TEL-NET), 2017, : 126 - 132
  • [3] THE PECULIARITIES OF SPEECH AND MUSIC INTERACTION IN SPEECH-AND-MUSIC WORKS
    Marchenko, V. V.
    ADVANCED EDUCATION, 2015, (04) : 40 - 44
  • [4] Hearing speech in music
    Ekstrom, Seth-Reino
    Borg, Erik
    NOISE & HEALTH, 2011, 13 (53): : 277 - 285
  • [5] The music of speech:: Music training facilitates pitch processing in both music and language
    Schön, D
    Magne, C
    Besson, M
    PSYCHOPHYSIOLOGY, 2004, 41 (03) : 341 - 349
  • [6] Speech Acts and Music Acts - intentionality in language and music
    Nelson, Peter
    RIVISTA ITALIANA DI FILOSOFIA DEL LINGUAGGIO, 2020, 14 (01): : 130 - 142
  • [7] Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music
    Lee, Hweeling
    Noppeney, Uta
    FRONTIERS IN PSYCHOLOGY, 2014, 5
  • [8] Separation of speech & music using temporal-spectral features and neural classifiers
    Sawant, Omkar
    Bhowmick, Anirban
    Bhagwat, Ganesh
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (03) : 1389 - 1403
  • [9] Temporal modulations in speech and music
    Ding, Nai
    Patel, Aniruddh D.
    Chen, Lin
    Butler, Henry
    Luo, Cheng
    Poeppel, David
    NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2017, 81 : 181 - 187
  • [10] Unsupervised Music Source Separation Using Differentiable Parametric Source Models
    Schulze-Forster, Kilian
    Richard, Gael
    Kelley, Liam
    Doire, Clement S. J.
    Badeau, Roland
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1276 - 1289