POLYNOMIAL EIGENVALUE DECOMPOSITION-BASED TARGET SPEAKER VOICE ACTIVITY DETECTION IN THE PRESENCE OF COMPETING TALKERS

被引:4
作者
Neo, Vincent W. [1 ]
Weiss, Stephan [2 ]
McKnight, Simon W. [1 ]
Hogg, Aidan O. T. [1 ]
Naylor, Patrick A. [1 ]
机构
[1] Imperial Coll London, Dept Elect & Elect Engn, London, England
[2] Univ Strathclyde, Dept Elect & Elect Engn, Glasgow, Lanark, Scotland
来源
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022) | 2022年
基金
英国工程与自然科学研究理事会;
关键词
polynomial eigenvalue decomposition; target speaker voice activity detection; speaker activity detection; MATRIX; ALGORITHM; EVD;
D O I
10.1109/IWAENC53105.2022.9914796
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing across multi-microphones to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.
引用
收藏
页数:5
相关论文
共 43 条
[1]  
Bai Y, 2019, ASIAPAC SIGN INFO PR, P1173, DOI 10.1109/APSIPAASC47483.2019.9023262
[2]  
Braun S, 2021, EUR SIGNAL PR CONF, P421, DOI 10.23919/EUSIPCO54536.2021.9616082
[3]   Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475
[4]  
Coutts FK, 2018, CONF REC ASILOMAR C, P1011, DOI 10.1109/ACSSC.2018.8645226
[5]  
Coventry W, 2017, EUR SIGNAL PR CONF, P2448, DOI 10.23919/EUSIPCO.2017.8081650
[6]  
Delaosa C., 2019, 2019 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE (SSPD)
[7]  
Delaosa C, 2019, INT CONF ACOUST SPEE, P8033, DOI [10.1109/ICASSP.2019.8683339, 10.1109/icassp.2019.8683339]
[8]  
Ding S., 2020, PROC ODYSSEY
[9]   GSVD-based optimal filtering for single and multimicrophone speech enhancement [J].
Doclo, S ;
Moonen, M .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (09) :2230-2244
[10]  
Fujita Y, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P296, DOI [10.1109/ASRU46091.2019.9003959, 10.1109/asru46091.2019.9003959]