Speaker Separation Using Visual Speech Features and Single-channel Audio

被引：0

作者：

Khan, Faheem ^{[1
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.

引用

页码：3263 / 3267

页数：5

共 50 条

[21] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Michelsanti, Daniel
Tan, Zheng-Hua
Zhang, Shi-Xiong
Xu, Yong
Yu, Meng
Yu, Dong
Jensen, Jesper
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396
[22] Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation
Zhang Long
Xu Xu
Chen Huang
Chen Jiaxu
Ye Zhongfu
SPEECH COMMUNICATION, 2018, 97 : 1 - 8
[23] FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH
Naini, Abinay Reddy
Rao, Achuth M., V
Ghosh, Prasanta Kumar
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6231 - 6235
[24] SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction
Zhang, Jie
Tao, Rui
Du, Jun
Dai, Li-Rong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3176 - 3189
[25] A novel moving window-based power spectrum features for single-channel EEG classification using machine learning
Alqudah, Ali Mohammad
Qazan, Shoroq
Obeidat, Yusra M.
ACTA SCIENTIARUM-TECHNOLOGY, 2023, 45
[26] Robust Speech-Distortion Weighted Interframe Wiener Filters for Single-Channel Noise Reduction
Andersen, Kristian Timm
Moonen, Marc
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 97 - 107
[27] Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks
Grais, Emad M.
Erdogan, Hakan
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1784 - 1787
[28] Emotion Classification Using Single-Channel Scalp-EEG Recording
Jalilifard, Amir
Pizzolato, Ednaldo Brigante
Islam, Md Kafiul
2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 845 - 849
[29] Extraction and Analysis of Speech Emotion Features Using Hybrid Punjabi Audio Dataset
Kaur, Kamaldeep
Singh, Parminder
SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS, ICSOFTCOMP 2022, 2023, 1788 : 275 - 287
[30] QoE Estimation of WebRTC-based Audio-visual Conversations from Facial and Speech Features
Bingol, Gulnaziye
Porcu, Simone
Floris, Alessandro
Atzori, Luigi
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)

← 1 2 3 4 5 →