Speaker Separation Using Visual Speech Features and Single-channel Audio

被引：0

作者：

Khan, Faheem ^{[1
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.

引用

页码：3263 / 3267

页数：5

共 50 条

[31] A Joint-Loss Approach for Speech Enhancement via Single-channel Neural Network and MVDR Beamformer
Tan, Zhi-Wei
Nguyen, Anh H. T.
Tran, Linh T. T.
Khong, Andy W. H.
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 841 - 849
[32] Gain Adapted Optimum Mixture Estimation Scheme for Single Channel Speech Separation
Kapoor, Divneet Singh
Kohli, Amit Kumar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2013, 32 (05) : 2335 - 2351
[33] A Two-step NMF Based Algorithm for Single Channel Speech Separation
Wang, Shuo
Wu, Wenjun
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1987 - 1990
[34] Single-channel noise reduction using unified joint diagonalization and optimal filtering
Norholm, Sidsel Marie
Benesty, Jacob
Jensen, Jesper Rindom
Christensen, Mads Graesboll
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014,
[35] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks
Hou, Jen-Cheng
Wang, Syu-Siang
Lai, Ying-Hui
Tsao, Yu
Chang, Hsiu-Wen
Wang, Hsin-Min
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (02): : 117 - 128
[36] Audio-Visual Speech Synchronization Detection Using a Bimodal Linear Prediction Model
Kumar, Kshitiz
Navratil, Jiri
Marcheret, Etienne
Libal, Vit
Ramaswamy, Ganesh
Potamianos, Gerasimos
2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 670 - +
[37] Prototypical speaker-interference loss for target voice separation using non-parallel audio samples
Mun, Seongkyu
Gowda, Dhananjaya
Lee, Jihwan
Han, Changwoo
Lee, Dokyun
Kim, Chanwoo
INTERSPEECH 2022, 2022, : 276 - 280
[38] Bird Species Classification Using Visual and Acoustic Features Extracted from Audio Signal
Lucio, Diego Rafael
da Costa, Yandre Maldonado e Gomes
PROCEEDINGS OF THE 2016 35TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2016,
[39] Speech Features Evaluation for Small Set Automatic Speaker Verification Using GMM-UBM System
Rakhmanenko, Ivan
Meshcheryakov, Roman
SPEECH AND COMPUTER, 2016, 9811 : 645 - 650
[40] Discrimination Between Native and Non-Native Speech Using Visual Features Only
Georgakis, Christos
Petridis, Stavros
Pantic, Maja
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) : 2758 - 2771

← 1 2 3 4 5 →