Speaker Separation Using Visual Speech Features and Single-channel Audio

被引：0

作者：

Khan, Faheem ^{[1
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.

引用

页码：3263 / 3267

页数：5

共 50 条

[41] Single Channel Speech Enhancement: using Wiener Filtering with Recursive Noise Estimation
Upadhyay, Navneet
Jaiswal, Rahul Kumar
PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2015), 2016, 84 : 22 - 30
[42] Automated Sleep Staging System Based on Ensemble Learning Model Using Single-Channel EEG Signal
Satapathy, Santosh Kumar
Kondaveeti, Hari Kishan
Malladi, Ravisankar
MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 186 - 202
[43] Automated Classification of Sleep Stages Using Single-Channel EEG: A Machine Learning-Based Method
Satapathy, Santosh Kumar
Loganathan, D.
INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (02)
[44] Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection
Lee, Geon Woo
Kim, Hong Kook
APPLIED SCIENCES-BASEL, 2020, 10 (09):
[45] Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
Ajmera, Pawan K.
Jadhav, Dattatray V.
Holambe, Raghunath S.
PATTERN RECOGNITION, 2011, 44 (10-11) : 2749 - 2759
[46] Automated Classification of Sleep Stages Using Single-Channel EEG Signal: A Machine Learning-Based Method
Satapathy, Santosh
Pattnaik, Shrinibas
Acharya, Badal
Rath, Rama Krushna
ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 235 - 247
[47] Robust sleep stage classification with single-channel EEG signals using multimodal decomposition and HMM-based refinement
Jiang, Dihong
Lu, Ya-nan
Ma, Yu
Wang, Yuanyuan
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 : 188 - 203
[48] On Learning Spectral Masking for Single Channel Speech Enhancement Using Feedforward and Recurrent Neural Networks
Saleem, Nasir
Khattak, Muhammad Irfan
Al-Hasan, Muath
Qazi, Abdul Baseer
IEEE ACCESS, 2020, 8 : 160581 - 160595
[49] Extracting Sub-glottal and Supra-glottal Features from MFCC using Convolutional Neural Networks for Speaker Identification in Degraded Audio Signals
Chowdhury, Anurag
Ross, Arun
2017 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB), 2017, : 608 - 617
[50] Text-independent speaker identification using modified SincNet with robust features from suitable acoustic region and appropriate optimizer for raw audio analysis
Shome, Nirupam
Kashyap, Richik
Laskar, Rabul Hussain
COMPUTERS & ELECTRICAL ENGINEERING, 2025, 121

← 1 2 3 4 5 →