Speaker Separation Using Visual Speech Features and Single-channel Audio

被引:0
作者
Khan, Faheem [1 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.
引用
收藏
页码:3263 / 3267
页数:5
相关论文
共 50 条
  • [21] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
    Michelsanti, Daniel
    Tan, Zheng-Hua
    Zhang, Shi-Xiong
    Xu, Yong
    Yu, Meng
    Yu, Dong
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396
  • [22] Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation
    Zhang Long
    Xu Xu
    Chen Huang
    Chen Jiaxu
    Ye Zhongfu
    SPEECH COMMUNICATION, 2018, 97 : 1 - 8
  • [23] FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH
    Naini, Abinay Reddy
    Rao, Achuth M., V
    Ghosh, Prasanta Kumar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6231 - 6235
  • [24] SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction
    Zhang, Jie
    Tao, Rui
    Du, Jun
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3176 - 3189
  • [25] A novel moving window-based power spectrum features for single-channel EEG classification using machine learning
    Alqudah, Ali Mohammad
    Qazan, Shoroq
    Obeidat, Yusra M.
    ACTA SCIENTIARUM-TECHNOLOGY, 2023, 45
  • [26] Robust Speech-Distortion Weighted Interframe Wiener Filters for Single-Channel Noise Reduction
    Andersen, Kristian Timm
    Moonen, Marc
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 97 - 107
  • [27] Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks
    Grais, Emad M.
    Erdogan, Hakan
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1784 - 1787
  • [28] Emotion Classification Using Single-Channel Scalp-EEG Recording
    Jalilifard, Amir
    Pizzolato, Ednaldo Brigante
    Islam, Md Kafiul
    2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 845 - 849
  • [29] Extraction and Analysis of Speech Emotion Features Using Hybrid Punjabi Audio Dataset
    Kaur, Kamaldeep
    Singh, Parminder
    SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS, ICSOFTCOMP 2022, 2023, 1788 : 275 - 287
  • [30] QoE Estimation of WebRTC-based Audio-visual Conversations from Facial and Speech Features
    Bingol, Gulnaziye
    Porcu, Simone
    Floris, Alessandro
    Atzori, Luigi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)