SPEECH OVERLAP DETECTION USING CONVOLUTIVE NON-NEGATIVE SPARSE CODING: NEW IMPROVEMENTS AND INSIGHTS

被引:0
|
作者
Geiger, Juergen T. [1 ]
Vipperla, Ravichander [2 ]
Evans, Nicholas [2 ]
Schuller, Bjoern [1 ]
Rigoll, Gerhard [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, D-8000 Munich, Germany
[2] EURECOM, Multimedia Commun Dept, Sophia Antipolis, France
关键词
speech overlap detection; convolutive non-negative sparse coding; speaker diarization;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents recent advances in the application of convolutive non-negative sparse coding (CNSC) to the problem of overlap detection in the context of conference meetings and speaker diarization. CNSC is used to project a mixed speaker signal onto separate speaker bases and hence to detect intervals of competing speech. We present new energy ratio and total energy features which give signicant improvements over our previous work. The system is assessed using a subset of the AMI meeting corpus. We report results which are comparable to the state of the art which support the potential of a new approach to overlap detection. An analysis of system performance highlights the importance of further work to addresses weaknesses in detecting particularly short segments of overlapping speech.
引用
收藏
页码:340 / 344
页数:5
相关论文
共 50 条
  • [1] SPEECH OVERLAP DETECTION AND ATTRIBUTION USING CONVOLUTIVE NON-NEGATIVE SPARSE CODING
    Vipperla, Ravichander
    Geiger, Juergen T.
    Bozonnet, Simon
    Wang, Dong
    Evans, Nicholas
    Schuller, Bjoern
    Rigoll, Gerhard
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4181 - 4184
  • [2] Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization
    Geiger, Juergen T.
    Vipperla, Ravichander
    Bozonnet, Simon
    Evans, Nicholas
    Schuller, Bjoern
    Rigoll, Gerhard
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2151 - 2154
  • [3] VOICE ACTIVITY DETECTION USING CONVOLUTIVE NON-NEGATIVE SPARSE CODING
    Teng, Peng
    Jia, Yunde
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7373 - 7377
  • [4] Convolutive Non-Negative Sparse Coding
    Wang, Wenwu
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 3681 - 3684
  • [5] Heterogeneous Convolutive Non-Negative Sparse Coding
    Wang, Dong
    Tejedor, Javier
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2147 - 2150
  • [6] Online Pattern Learning for Non-Negative Convolutive Sparse Coding
    Wang, Dong
    Vipperla, Ravichander
    Evans, Nicholas
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 72 - 75
  • [7] Convolutive Sparse Non-negative Matrix Factorization for Windy Speech
    Lai Xiaoqiang
    Li Shuangtian
    Yang Jie
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 494 - 497
  • [8] Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation
    Carlin, Michael A.
    Malyska, Nicolas
    Quatieri, Thomas F.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 582 - 585
  • [9] Non-negative sparse coding
    Hoyer, PO
    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 557 - 565
  • [10] LEARNING SPEECH FEATURES IN THE PRESENCE OF NOISE: SPARSE CONVOLUTIVE ROBUST NON-NEGATIVE MATRIX FACTORIZATION
    de Frein, Ruairi
    Rickard, Scott T.
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 1248 - 1253