Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations

被引:0
作者
Inoue, Koji [1 ]
Wakabayashi, Yukoh [2 ]
Yoshimoto, Hiromasa [3 ]
Takanashi, Katsuya [3 ]
Kawahara, Tatsuya [1 ,3 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
[2] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kyoto, Japan
[3] Kyoto Univ, Acad Ctr Comp & Media Studies, Kyoto 6068501, Japan
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
speaker diarization; backchannel; multi-modal; eye-gaze; poster conversation; FEATURES; RULES;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose multi-modal speaker diarization using acoustic and eye-gaze information in poster conversations. Eye-gaze information plays an important role in turn-taking, thus it is useful for predicting speech activity. In this paper, a variety of eye gaze features are elaborated and combined with the acoustic information by the multi-modal integration model. Moreover, we introduce another model to detect backchannels, which involve different eye-gaze behaviors. This enhances the diarization result by filtering meaningful utterances such as questions and comments. Experimental evaluations in real poster sessions demonstrate that eye-gaze information contributes to improvement of diarization accuracy under noisy environments, and its weight is automatically determined according to the Signal -to Noise Ratio (SNR).
引用
收藏
页码:3086 / 3090
页数:5
相关论文
共 36 条
[1]   Acoustic beamforming for speaker diarization of meetings [J].
Anguera, Xavier ;
Wooters, Chuck ;
Hernando, Javier .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2011-2022
[2]  
[Anonymous], P INTERSPEECH
[3]  
[Anonymous], INTERSPEECH
[4]  
Araki Shoko, 2008, 2008 Hands-Free Speech Communication and Microphone Arrays (HSCMA '08), P29, DOI 10.1109/HSCMA.2008.4538680
[5]  
Carletta J, 2005, LECT NOTES COMPUT SC, V3869, P28
[6]  
Chen L, 2005, LECT NOTES COMPUT SC, V3869, P40
[8]  
FISCUS JG, 2006, RICH TRANSCRIPTION 2
[9]   The ICSI RT-09 Speaker Diarization System [J].
Friedland, Gerald ;
Janin, Adam ;
Imseng, David ;
Anguera Miro, Xavier ;
Gottlieb, Luke ;
Huijbregts, Marijn ;
Knox, Mary Tai ;
Vinyals, Oriol .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :371-381
[10]   Automatic nonverbal analysis of social interaction in small groups: A review [J].
Gatica-Perez, Daniel .
IMAGE AND VISION COMPUTING, 2009, 27 (12) :1775-1787