VOICE ACTIVITY DETECTION USING CONVOLUTIVE NON-NEGATIVE SPARSE CODING

被引:0
|
作者
Teng, Peng [1 ]
Jia, Yunde [1 ]
机构
[1] Beijing Inst Technol, Sch Comp, Beijing, Peoples R China
关键词
voice activity detection; convolutive non-negative sparse coding; conditional random fields;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a voice activity detection (VAD) approach using convolutive non-negative sparse coding (CNSC) to improve the detection performance in low signal-to-noise (SNR) conditions. Our idea is to use noise-robust feature for speech signal detection while noise is reduced away. We first use magnitude spectrum as the non-negative and additive low-level representation of audio signals, and learn a speech dictionary from clean speech as well as a noise dictionary from noise samples. Then, the two dictionaries are concatenated to form a global dictionary, and an audio signal is decomposed into coefficient vectors using CNSC on the global dictionary. Only coefficients corresponding to the bases from the speech dictionary are taken as the features for the signal. At last, the activity labels is given by decoding a conditional random field (CRF) which is constructed to model the context of an audio signal for VAD. Experiments demonstrate that our VAD approach has an excellent performance in low SNR conditions.
引用
收藏
页码:7373 / 7377
页数:5
相关论文
共 50 条
  • [1] Convolutive Non-Negative Sparse Coding
    Wang, Wenwu
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 3681 - 3684
  • [2] SPEECH OVERLAP DETECTION AND ATTRIBUTION USING CONVOLUTIVE NON-NEGATIVE SPARSE CODING
    Vipperla, Ravichander
    Geiger, Juergen T.
    Bozonnet, Simon
    Wang, Dong
    Evans, Nicholas
    Schuller, Bjoern
    Rigoll, Gerhard
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4181 - 4184
  • [3] Heterogeneous Convolutive Non-Negative Sparse Coding
    Wang, Dong
    Tejedor, Javier
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2147 - 2150
  • [4] Voice Activity Detection Via Noise Reducing Using Non-Negative Sparse Coding
    Teng, Peng
    Jia, Yunde
    IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (05) : 475 - 478
  • [5] SPEECH OVERLAP DETECTION USING CONVOLUTIVE NON-NEGATIVE SPARSE CODING: NEW IMPROVEMENTS AND INSIGHTS
    Geiger, Juergen T.
    Vipperla, Ravichander
    Evans, Nicholas
    Schuller, Bjoern
    Rigoll, Gerhard
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 340 - 344
  • [6] Online Pattern Learning for Non-Negative Convolutive Sparse Coding
    Wang, Dong
    Vipperla, Ravichander
    Evans, Nicholas
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 72 - 75
  • [7] The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization
    Zhang, Qianmin
    Tao, Liang
    Zhou, Jian
    Wang, Huabin
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES FOR RAIL TRANSPORTATION: TRANSPORTATION, 2016, 378 : 259 - 267
  • [8] Non-negative sparse coding
    Hoyer, PO
    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 557 - 565
  • [9] Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization
    Geiger, Juergen T.
    Vipperla, Ravichander
    Bozonnet, Simon
    Evans, Nicholas
    Schuller, Bjoern
    Rigoll, Gerhard
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2151 - 2154
  • [10] Convolutive Sparse Non-negative Matrix Factorization for Windy Speech
    Lai Xiaoqiang
    Li Shuangtian
    Yang Jie
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 494 - 497