Estimation of speaker position using audio information

被引：0

作者：

Vahedian, A ^{[1
]}

Frater, M ^{[1
]}

Arnold, J ^{[1
]}

Cavenor, M ^{[1
]}

Godara, L ^{[1
]}

Pickering, M ^{[1
]}

机构：

[1] Univ New S Wales, Sch Elect Engn, Australian Def Force Acad, Canberra, ACT, Australia

来源：

IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS | 1997年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-time conversational video telecommunications services, such as video-conferencing, are becoming ever more important as a substitute for face-to-face meetings. One of the perceived weaknesses of existing services is the picture quality achieved, especially around the face of a speaker. A possible solution would be to identify the location of face, which is then transmitted at a higher quality than the rest of the picture. In this paper, we present a new technique for identifying the face using an array of microphones. As opposed to other techniques proposed so far, which make assumptions about the content of the video material, the idea relies on the estimation of lip position based on the audio processing from the speaker's speech,. Once this estimation is performed, then a two or possibly three stage quantisation on video information will facilitate the compression of the subjectively more important parts, i.e. the face of a speaker with lower distortion. This new technique, which is compatible with all existing video compression standards, is much cheaper and easier to implement than previous techniques.

引用

页码：181 / 184

页数：4

共 50 条

[1] Speaker position detection system using audio-visual information
Matsuo, N
Kitagawa, H
Nagata, S
FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1999, 35 (02): : 212 - 220
[2] Using audio and visual information for single channel speaker separation
Khan, Faheem
Milner, Ben
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1517 - 1521
[3] Multi-speaker DoA Estimation Using Audio and Visual Modality
Wu, Yulin
Hu, Ruimin
Wang, Xiaochen
Ke, Shanfa
NEURAL PROCESSING LETTERS, 2023, 55 (07) : 8887 - 8901
[4] Multi-speaker DoA Estimation Using Audio and Visual Modality
Yulin Wu
Ruimin Hu
Xiaochen Wang
Shanfa Ke
Neural Processing Letters, 2023, 55 : 8887 - 8901
[5] Speaker Selection Algorithm using Audio and Video Information in a Cluttered Environment
Lim, Yoonseob
Choi, Jongsuk
2008 IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, PROCEEDINGS, 2008, : 150 - 155
[6] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
Khan, Faheem Ullah
Milner, Ben P.
Le Cornu, Thomas
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
[7] Fusing Audio and Video Information for Online Speaker Diarization
Schmalenstroeer, Joerg
Kelling, Martin
Leutnant, Volker
Haeb-Umbach, Reinhold
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1159 - 1162
[8] A speaker tracking algorithm based on audio and visual information fusion using particle filter
Li, X
Sun, L
Tao, LM
Xu, GY
Jia, Y
IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 572 - 580
[9] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145
[10] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
Xiong, Zichao
Liu, Hongqing
Zhou, Yi
Luo, Zhen
2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325

← 1 2 3 4 5 →