An advanced entropy-based feature with a frame-level vocal effort likelihood space modeling for distant whisper-island detection

被引：2

作者：

Zhang, Chi ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Elect Engn, Erik Jonsson Sch Engn & Comp Sci, CRSS, Richardson, TX 75080 USA

来源：

SPEECH COMMUNICATION | 2015年 / 66卷

关键词：

Vocal effort; Distant whisper; Detection; BIC; T-2-BIC; Segmentation; Clustering; SPEAKER IDENTIFICATION; SEGMENTATION;

D O I：

10.1016/j.specom.2014.09.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A challenging research problem which has received limited attention in the speech research community is whisper-island detection. Effective whisper island, or VECP-Vocal Effort Change Point, detection is the first step needed to ensure the engagement of effective subsequent speech processing steps to address whisper. In this study, we first propose an improved entropy-based feature from a previous study which is integrated within a model-less whisper-island detection algorithm. The improved 3-D WhID feature shows better discrimination properties between whisper and neutral speech, resulting in a 0.00% MDR (miss detection rate), lower FAR (false alarm rate), MMR (mismatch rate) and collectively a reduced MES (multi-error score). With improved VECP detection results and no need for a prior trained GMM, the BIC-based vocal effort clustering algorithm attains a 100% detection rate of whisper-islands. In this study, a more challenging task of distant whisper-island detection is also addressed using a proposed frame-based vocal effort likelihood space modeling algorithm (model-base). A corpus named UT-YE-Ill consisting of spontaneous and read whisper embedded neutral speech using a microphone array from various distances in a real-world conference room is developed. For the whisper embedded neutral speech of UT-VE-III at 1-m, 3-m and 5-m distance using a Lavalier microphone and distant microphone, the proposed algorithm sustains consistent performance for VECP detection and whisper classification rates. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：107 / 117

页数：11

共 18 条

[1] A robust speaker clustering algorithm
Ajmera, J
Wooters, C
[J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 411 - 416
[2] [Anonymous], OTOLARYNGOL CLIN N A
[3] [Anonymous], INTERSPEECH2002
[4] [Anonymous], INTERSPEECH2011
[5] [Anonymous], 1998, Proc. DARPA Broadcast News Transcription and Understanding Workshop
[6] Cettolo M, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL VI, PROCEEDINGS, P537
[7] Speaker Identification Within Whispered Speech Audio Streams
Fan, Xing
Hansen, John H. L.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1408 - 1421
[8] SPEAKER IDENTIFICATION WITH WHISPERED SPEECH BASED ON MODIFIED LFCC PARAMETERS AND FEATURE MAPPING
Fan, Xing
Hansen, John H. L.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4553 - 4556
[9] Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection
GavidiaCeballos, L
Hansen, JHL
[J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1996, 43 (04) : 373 - 383
[10] Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora
Huang, RQ
Hansen, JHL
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 907 - 919

← 1 2 →