Onmidirectional audio-visual talker localization based on dynamic fusion of audio-visual features using validity and reliability criteria

被引：0

作者：

Denda, Yuki ^{[1
]}

Nishiura, Takanobu ^{[2
]}

Yamashita, Yoichi ^{[2
]}

机构：

[1] Ritsumeikan Univ, Grad Sch Sci & Engn, Kusatsu 5258577, Japan

[2] Ritsumeikan Univ, Coll Informat Sci & Engn, Kusatsu 5258577, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2008年 / E91D卷 / 03期

关键词：

omnidirectional talker localization; dynamic fusion; DOA estimation; human position estimation; AV applications;

D O I：

10.1093/ietisy/e91-d.3.598

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audio- or visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.

引用

页码：598 / 606

页数：9

共 17 条

[1] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[2] Robust talker direction estimation based on weighted CSP analysis and maximum likelihood estimation
Denda, Y
Nishiura, T
Yamashita, Y
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1050 - 1057
[3] Denda Y, 2004, 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, P63
[4] COMPUTER-STEERED MICROPHONE ARRAYS FOR SOUND TRANSDUCTION IN LARGE ROOMS
FLANAGAN, JL
JOHNSTON, JD
ZAHN, R
ELKO, GW
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 78 (05) : 1508 - 1518
[5] Boosted learning in dynamic Bayesian networks for Multimodal speaker detection
Garg, A
Pavlovic, V
Rehg, JM
[J]. PROCEEDINGS OF THE IEEE, 2003, 91 (09) : 1355 - 1369
[6] Hain T, 2007, INT CONF ACOUST SPEE, P357
[7] Face detection in color images
Hsu, RL
Abdel-Mottaleb, M
Jain, AK
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) : 696 - 706
[8] GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY
KNAPP, CH
CARTER, GC
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04): : 320 - 327
[9] Li MK, 2003, 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, P473
[10] NAGAYA S, 1996, IEICE D 2, V79, P568

← 1 2 →