Speaker Separation Using Visual Speech Features and Single-channel Audio

被引：0

作者：

Khan, Faheem ^{[1
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.

引用

页码：3263 / 3267

页数：5

共 50 条

[1] Using audio and visual information for single channel speaker separation
Khan, Faheem
Milner, Ben
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1517 - 1521
[2] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
Maciejewski, Matthew
Watanabe, Shinji
Khudanpur, Sanjeev
INTERSPEECH 2021, 2021, : 3520 - 3524
[3] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
Khan, Faheem Ullah
Milner, Ben P.
Le Cornu, Thomas
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
[4] Soft mask methods for single-channel speaker separation
Reddy, Aarthi M.
Raj, Bhiksha
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1766 - 1776
[5] IMPROVED SINGLE-CHANNEL SPEECH SEPARATION USING SINUSOIDAL MODELING
Mowlaee, Pejman
Christensen, Mads Graesboll
Jensen, Soren Holdt
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 21 - 24
[6] Enhancing Audio Speech using Visual Speech Features
Almajai, Ibrahim
Milner, Ben
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1915 - 1918
[7] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
Taherian, Hassan
Wang, Zhong-Qiu
Chang, Jorge
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
[8] Subjective and Objective Quality Assessment of Single-Channel Speech Separation Algorithms
Mowlaee, P.
Saeidi, R.
Christensen, M. G.
Martin, R.
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 69 - 72
[9] Assessment of Single-Channel Speech Enhancement Techniques for Speaker Identification under Mismatched Conditions
Sadjadi, Seyed Omid
Hansen, John H. L.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2138 - 2141
[10] Dynamic visual features for audio-visual speaker verification
Dean, David
Sridharan, Sridha
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02) : 136 - 149

← 1 2 3 4 5 →