Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast

被引:0
作者
Poignant, Johann [1 ]
Bredin, Herve
Le, Viet Bac
Besacier, Laurent [1 ]
Barras, Claude
Quenot, Georges [1 ]
机构
[1] UJF Grenoble 1, UPMF Grenoble 2, Grenoble INP, CNR,LIG UMR 5217, F-38041 Grenoble, France
来源
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年
关键词
unsupervised speaker identification; multimodal fusion; speaker diarization; optical character recognition; reproducible results; DIARIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an approach for unsupervised speaker identification in TV broadcast videos, by combining acoustic speaker diarization with person names obtained via video OCR from overlaid texts. Three methods for the propagation of the overlaid names to the speech turns are compared, taking into account the co-occurence duration between the speaker clusters and the names provided by the video OCR and using a task-adapted variant of the TF-IDF information retrieval coefficient. These methods were tested on the REPERE dry-run evaluation corpus, containing 3 hours of annotated videos. Our best unsupervised system reaches a F-measure of 70.2% when considering all the speakers, and 81.7% if anchor speakers are left out. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.5% F-measure when considering all the speakers and 45.7% without anchor.
引用
收藏
页码:2649 / 2652
页数:4
相关论文
共 8 条
  • [1] [Anonymous], IEEE WORKSH AUT SPEE
  • [2] [Anonymous], LREC
  • [3] [Anonymous], IEEE ICME
  • [4] Multistage speaker diarization of broadcast news
    Barras, Claude
    Zhu, Xuan
    Meignier, Sylvain
    Gauvain, Jean-Luc
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
  • [5] Support vector machines using GMM supervectors for speaker verification
    Campbell, WM
    Sturim, DE
    Reynolds, DA
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311
  • [6] Taking the bite out of automated naming of characters in TV video
    Everingham, Mark
    Sivic, Josef
    Zisserman, Andrew
    [J]. IMAGE AND VISION COMPUTING, 2009, 27 (05) : 545 - 559
  • [7] AUTOMATIC NAMED IDENTIFICATION OF SPEAKERS USING DIARIZATION AND ASR SYSTEMS
    Jousse, Vincent
    Petit-Renaud, Simon
    Meignier, Sylvain
    Esteve, Yannick
    Jacquin, Christine
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4557 - +
  • [8] The Hungarian Method for the assignment problem
    Kuhn, HW
    [J]. NAVAL RESEARCH LOGISTICS, 2005, 52 (01) : 7 - 21