Using audio and visual information for single channel speaker separation

被引：0

作者：

Khan, Faheem ^{[1
,2
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

[2] Univ Sci & Technol, Dept Software Engn, Bannu, Pakistan

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

Speaker separation; soft mask; visual features; audio-visual correlation; SPEECH SEPARATION; RECOGNITION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work proposes a method to exploit both audio and visual speech information to extract a target speaker from a mixture of competing speakers. The work begins by taking an effective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The audio input is taken from a single channel and includes the mixture of speakers, and a separate set of visual features is extracted from each speaker. This allows modification of the separation process to include not only the audio speech but also visual speech from each speaker in the mixture. Experimental results are presented that compare the proposed audio-visual speaker separation with audio-only and visual-only methods using both speech quality and speech intelligibility metrics.

引用

页码：1517 / 1521

页数：5

共 50 条

[1] Speaker Separation Using Visual Speech Features and Single-channel Audio
Khan, Faheem
Milner, Ben
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3263 - 3267
[2] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
Khan, Faheem Ullah
Milner, Ben P.
Le Cornu, Thomas
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
[3] Speaker position detection system using audio-visual information
Matsuo, N
Kitagawa, H
Nagata, S
FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1999, 35 (02): : 212 - 220
[4] Single channel audio source separation
Gao, Bin
Woo, W.L.
Dlay, S.S.
WSEAS Transactions on Signal Processing, 2008, 4 (04): : 173 - 182
[5] Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues
Ochiai, Tsubasa
Delcroix, Marc
Kinoshita, Keisuke
Ogawa, Atsunori
Nakatani, Tomohiro
INTERSPEECH 2019, 2019, : 2718 - 2722
[6] A speaker tracking algorithm based on audio and visual information fusion using particle filter
Li, X
Sun, L
Tao, LM
Xu, GY
Jia, Y
IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 572 - 580
[7] Estimation of speaker position using audio information
Vahedian, A
Frater, M
Arnold, J
Cavenor, M
Godara, L
Pickering, M
IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 181 - 184
[8] Speaker Selection and Tracking in a Cluttered Environment with Audio and Visual Information
Lim, Yoonseob
Choi, Jongsuk
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (03) : 1581 - 1589
[9] Speaker Independent Single Channel Source Separation Using Sinusoidal Features
Ranjan, Shivesh
Payton, Karen L.
Mowlaee, Pejman
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1522 - 1525
[10] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145

← 1 2 3 4 5 →