Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech

被引：0

作者：

Das, Rohan Kumar ^{[1
]}

Yang, Jichen ^{[1
]}

Li, Haizhou ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

关键词：

DIARIZATION; RECOGNITION; SYSTEM;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speaker verification in a multi-speaker environment is an emerging research topic. Speaker clustering, that separates multiple speakers, can be effective if a predetermined threshold or the number of speakers present in a multi-speaker utterance is given. However, the problem in practice does not provide the leverage for either of the factors. This work proposes to handle such a problem by introducing a penalty distance factor in the pipeline of traditional clustering techniques. The proposed framework first uses traditional clustering techniques to form speaker clusters for a given number of speakers. We then compute the penalty distance based on Bayesian information criterion that is used for merging alike clusters in a multi-speaker utterance. The studies are conducted on speakers in the wild (SITW) and recent NIST SRE 2018 databases that contain multi-speaker conversational speech in noisy environments. The results show the effectiveness of the proposed penalty distance based refinement in such a scenario.

引用

页码：1630 / 1635

页数：6

共 45 条

[1] Speaker Diarization: A Review of Recent Research [J].

Anguera Miro, Xavier ;

Bozonnet, Simon ;

Evans, Nicholas ;

Fredouille, Corinne ;

Friedland, Gerald ;

Vinyals, Oriol .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370

[2]

[Anonymous], CORR

[3]

[Anonymous], 2018, NIST 2018 speaker recognition evaluation plan

[4]

Brummer<spacing diaeresis> N., 2013, The bosaris toolkit: Theory, algorithms and code for surviving the new dcf

[5]

Chen S., 1998, Paper presented at Proceedings of the Broadcast News Transcription and Understanding Workshop, Lansdowne Conference Resort, Lansdowne, Virginia, February, V8, P127

[6]

Chen Xianhong, 2018, SPEAK LANG REC WORKS, P134

[7]

Chung JS, 2018, INTERSPEECH, P1086

[8] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[9] Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling [J].

Delgado, Hector ;

Anguera, Xavier ;

Fredouille, Corinne ;

Serrano, Javier .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2286-2297

[10] BUT system for DIHARD Speech Diarization Challenge 2018 [J].

Diez, Mireia ;

Landini, Federico ;

Burget, Lukas ;

Rohdin, Johan ;

Silnova, Anna ;

Zmolikova, Katerina ;

Novotny, Ondrej ;

Vesely, Karel ;

Glembek, Ondrej ;

Plchot, Oldrich ;

Mosner, Ladislav ;

Matejka, Pavel .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :2798-2802

← 1 2 3 4 5 →