Using audio and visual information for single channel speaker separation

被引:0
|
作者
Khan, Faheem [1 ,2 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
[2] Univ Sci & Technol, Dept Software Engn, Bannu, Pakistan
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
Speaker separation; soft mask; visual features; audio-visual correlation; SPEECH SEPARATION; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work proposes a method to exploit both audio and visual speech information to extract a target speaker from a mixture of competing speakers. The work begins by taking an effective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The audio input is taken from a single channel and includes the mixture of speakers, and a separate set of visual features is extracted from each speaker. This allows modification of the separation process to include not only the audio speech but also visual speech from each speaker in the mixture. Experimental results are presented that compare the proposed audio-visual speaker separation with audio-only and visual-only methods using both speech quality and speech intelligibility metrics.
引用
收藏
页码:1517 / 1521
页数:5
相关论文
共 50 条
  • [1] Speaker Separation Using Visual Speech Features and Single-channel Audio
    Khan, Faheem
    Milner, Ben
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3263 - 3267
  • [2] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
    Khan, Faheem Ullah
    Milner, Ben P.
    Le Cornu, Thomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
  • [3] Speaker position detection system using audio-visual information
    Matsuo, N
    Kitagawa, H
    Nagata, S
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1999, 35 (02): : 212 - 220
  • [4] Single channel audio source separation
    Gao, Bin
    Woo, W.L.
    Dlay, S.S.
    WSEAS Transactions on Signal Processing, 2008, 4 (04): : 173 - 182
  • [5] Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues
    Ochiai, Tsubasa
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Nakatani, Tomohiro
    INTERSPEECH 2019, 2019, : 2718 - 2722
  • [6] A speaker tracking algorithm based on audio and visual information fusion using particle filter
    Li, X
    Sun, L
    Tao, LM
    Xu, GY
    Jia, Y
    IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 572 - 580
  • [7] Estimation of speaker position using audio information
    Vahedian, A
    Frater, M
    Arnold, J
    Cavenor, M
    Godara, L
    Pickering, M
    IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 181 - 184
  • [8] Speaker Selection and Tracking in a Cluttered Environment with Audio and Visual Information
    Lim, Yoonseob
    Choi, Jongsuk
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (03) : 1581 - 1589
  • [9] Speaker Independent Single Channel Source Separation Using Sinusoidal Features
    Ranjan, Shivesh
    Payton, Karen L.
    Mowlaee, Pejman
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1522 - 1525
  • [10] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145