POLYNOMIAL EIGENVALUE DECOMPOSITION-BASED TARGET SPEAKER VOICE ACTIVITY DETECTION IN THE PRESENCE OF COMPETING TALKERS

被引：4

作者：

Neo, Vincent W. ^{[1
]}

Weiss, Stephan ^{[2
]}

McKnight, Simon W. ^{[1
]}

Hogg, Aidan O. T. ^{[1
]}

Naylor, Patrick A. ^{[1
]}

机构：

[1] Imperial Coll London, Dept Elect & Elect Engn, London, England

[2] Univ Strathclyde, Dept Elect & Elect Engn, Glasgow, Lanark, Scotland

来源：

2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022) | 2022年

基金：

英国工程与自然科学研究理事会;

关键词：

polynomial eigenvalue decomposition; target speaker voice activity detection; speaker activity detection; MATRIX; ALGORITHM; EVD;

D O I：

10.1109/IWAENC53105.2022.9914796

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing across multi-microphones to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.

引用

页数：5

共 43 条

[1]

Bai Y, 2019, ASIAPAC SIGN INFO PR, P1173, DOI 10.1109/APSIPAASC47483.2019.9023262

[2]

Braun S, 2021, EUR SIGNAL PR CONF, P421, DOI 10.23919/EUSIPCO54536.2021.9616082

[3] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].

Cohen, I .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475

[4]

Coutts FK, 2018, CONF REC ASILOMAR C, P1011, DOI 10.1109/ACSSC.2018.8645226

[5]

Coventry W, 2017, EUR SIGNAL PR CONF, P2448, DOI 10.23919/EUSIPCO.2017.8081650

[6]

Delaosa C., 2019, 2019 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD)

[7]

Delaosa C, 2019, INT CONF ACOUST SPEE, P8033, DOI [10.1109/ICASSP.2019.8683339, 10.1109/icassp.2019.8683339]

[8]

Ding S., 2020, PROC ODYSSEY

[9] GSVD-based optimal filtering for single and multimicrophone speech enhancement [J].

Doclo, S ;

Moonen, M .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (09) :2230-2244

[10]

Fujita Y, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P296, DOI [10.1109/ASRU46091.2019.9003959, 10.1109/asru46091.2019.9003959]

← 1 2 3 4 5 →