Speaker Separation Using Visual Speech Features and Single-channel Audio

被引:0
作者
Khan, Faheem [1 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.
引用
收藏
页码:3263 / 3267
页数:5
相关论文
共 50 条
  • [31] A Joint-Loss Approach for Speech Enhancement via Single-channel Neural Network and MVDR Beamformer
    Tan, Zhi-Wei
    Nguyen, Anh H. T.
    Tran, Linh T. T.
    Khong, Andy W. H.
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 841 - 849
  • [32] Gain Adapted Optimum Mixture Estimation Scheme for Single Channel Speech Separation
    Kapoor, Divneet Singh
    Kohli, Amit Kumar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2013, 32 (05) : 2335 - 2351
  • [33] A Two-step NMF Based Algorithm for Single Channel Speech Separation
    Wang, Shuo
    Wu, Wenjun
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1987 - 1990
  • [34] Single-channel noise reduction using unified joint diagonalization and optimal filtering
    Norholm, Sidsel Marie
    Benesty, Jacob
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014,
  • [35] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (02): : 117 - 128
  • [36] Audio-Visual Speech Synchronization Detection Using a Bimodal Linear Prediction Model
    Kumar, Kshitiz
    Navratil, Jiri
    Marcheret, Etienne
    Libal, Vit
    Ramaswamy, Ganesh
    Potamianos, Gerasimos
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 670 - +
  • [37] Prototypical speaker-interference loss for target voice separation using non-parallel audio samples
    Mun, Seongkyu
    Gowda, Dhananjaya
    Lee, Jihwan
    Han, Changwoo
    Lee, Dokyun
    Kim, Chanwoo
    INTERSPEECH 2022, 2022, : 276 - 280
  • [38] Bird Species Classification Using Visual and Acoustic Features Extracted from Audio Signal
    Lucio, Diego Rafael
    da Costa, Yandre Maldonado e Gomes
    PROCEEDINGS OF THE 2016 35TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2016,
  • [39] Speech Features Evaluation for Small Set Automatic Speaker Verification Using GMM-UBM System
    Rakhmanenko, Ivan
    Meshcheryakov, Roman
    SPEECH AND COMPUTER, 2016, 9811 : 645 - 650
  • [40] Discrimination Between Native and Non-Native Speech Using Visual Features Only
    Georgakis, Christos
    Petridis, Stavros
    Pantic, Maja
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) : 2758 - 2771