MUSIC MODELS FOR MUSIC-SPEECH SEPARATION

被引:0
作者
Hughes, Thad
Kristjansson, Trausti
机构
来源
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年
关键词
ASR; noise robustness; noise reduction; non-stationary noise; music;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio prior to speech. We show over 8% relative improvement in WER at 10 dB SNR for a real world Voice Search ASR system. We investigate the relationship between ASR accuracy and the amount of music background used as prologue and the the size of music models. Our study shows that performance peaks when using a music prologue of around 6 seconds to train the music model. We hypothesize that this is due to the dynamic nature of music and the structure of popular music. Adding more history beyond a certain point does not improve results. Additionally, we show moderately sized 8-component music GMM models suffice to model this amount of music prologue.
引用
收藏
页码:4917 / 4920
页数:4
相关论文
共 50 条
[21]   Speech/music discrimination for analysis of radio stations [J].
Kacprzak, Stanislaw ;
Chwiecko, Blazej ;
Ziolko, Bartosz .
2017 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP), 2017,
[22]   Music training and vocal production of speech and song [J].
Stegemoeller, Elizabeth L. ;
Skoe, Erika ;
Nicol, Trent ;
Warrier, Catherine M. ;
Kraus, Nina .
MUSIC PERCEPTION, 2008, 25 (05) :419-428
[23]   A Commentary on: "Neural overlap in processing music and speech" [J].
Kunert, Richard ;
Slevc, L. Robert .
FRONTIERS IN HUMAN NEUROSCIENCE, 2015, 9
[24]   Statistical learning of speech, not music, in congenital amusia [J].
Peretz, Isabelle ;
Saffran, Jenny ;
Schoen, Daniele ;
Gosselin, Nathalie .
NEUROSCIENCES AND MUSIC IV: LEARNING AND MEMORY, 2012, 1252 :361-367
[25]   Speech, music, soundscape and listening: interdisciplinary explorations [J].
Truax, Barry .
INTERDISCIPLINARY SCIENCE REVIEWS, 2022, 47 (02) :279-293
[26]   Finding the music of speech: Musical knowledge influences pitch processing in speech [J].
der Nederlanden, Christina M. Vanden Bosch ;
Hannon, Erin E. ;
Snyder, Joel S. .
COGNITION, 2015, 143 :135-140
[27]   Audiovisual synchrony perception for music, speech, and object actions [J].
Vatakis, Argiro ;
Spence, Charles .
BRAIN RESEARCH, 2006, 1111 :134-142
[28]   MUSIC R-EVOLUTION FROM SOUND TO SPEECH [J].
Anastasi, Alessandra .
RETI SAPERI LINGUAGGI-ITALIAN JOURNAL OF COGNITIVE SCIENCES, 2014, 1 (02) :267-279
[29]   Speech dysprosody but no music 'dysprosody' in Parkinson's disease [J].
Harris, Robert ;
Leenders, Klaus L. ;
de Jong, Bauke M. .
BRAIN AND LANGUAGE, 2016, 163 :1-9
[30]   CLASS-CONDITIONAL EMBEDDINGS FOR MUSIC SOURCE SEPARATION [J].
Seetharaman, Prem ;
Wichern, Gordon ;
Venkataramani, Shrikant ;
Le Roux, Jonathan .
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, :301-305