Speaker Separation Using Visual Speech Features and Single-channel Audio

被引:0
作者
Khan, Faheem [1 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.
引用
收藏
页码:3263 / 3267
页数:5
相关论文
共 50 条
  • [41] Single Channel Speech Enhancement: using Wiener Filtering with Recursive Noise Estimation
    Upadhyay, Navneet
    Jaiswal, Rahul Kumar
    PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2015), 2016, 84 : 22 - 30
  • [42] Automated Sleep Staging System Based on Ensemble Learning Model Using Single-Channel EEG Signal
    Satapathy, Santosh Kumar
    Kondaveeti, Hari Kishan
    Malladi, Ravisankar
    MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 186 - 202
  • [43] Automated Classification of Sleep Stages Using Single-Channel EEG: A Machine Learning-Based Method
    Satapathy, Santosh Kumar
    Loganathan, D.
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (02)
  • [44] Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection
    Lee, Geon Woo
    Kim, Hong Kook
    APPLIED SCIENCES-BASEL, 2020, 10 (09):
  • [45] Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
    Ajmera, Pawan K.
    Jadhav, Dattatray V.
    Holambe, Raghunath S.
    PATTERN RECOGNITION, 2011, 44 (10-11) : 2749 - 2759
  • [46] Automated Classification of Sleep Stages Using Single-Channel EEG Signal: A Machine Learning-Based Method
    Satapathy, Santosh
    Pattnaik, Shrinibas
    Acharya, Badal
    Rath, Rama Krushna
    ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 235 - 247
  • [47] Robust sleep stage classification with single-channel EEG signals using multimodal decomposition and HMM-based refinement
    Jiang, Dihong
    Lu, Ya-nan
    Ma, Yu
    Wang, Yuanyuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 : 188 - 203
  • [48] On Learning Spectral Masking for Single Channel Speech Enhancement Using Feedforward and Recurrent Neural Networks
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Al-Hasan, Muath
    Qazi, Abdul Baseer
    IEEE ACCESS, 2020, 8 : 160581 - 160595
  • [49] Extracting Sub-glottal and Supra-glottal Features from MFCC using Convolutional Neural Networks for Speaker Identification in Degraded Audio Signals
    Chowdhury, Anurag
    Ross, Arun
    2017 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB), 2017, : 608 - 617
  • [50] Text-independent speaker identification using modified SincNet with robust features from suitable acoustic region and appropriate optimizer for raw audio analysis
    Shome, Nirupam
    Kashyap, Richik
    Laskar, Rabul Hussain
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 121