Using Visual Speech Information in Masking Methods for Audio Speaker Separation

被引:7
作者
Khan, Faheem Ullah [1 ]
Milner, Ben P. [1 ]
Le Cornu, Thomas [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
关键词
Speaker separation; audio-visual processing; binary masks; ratio mask; ENHANCEMENT; NOISE; INTELLIGIBILITY; SEGREGATION; PREDICTION; FREQUENCY; TRACKING;
D O I
10.1109/TASLP.2018.2835719
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper examines whether visual speech information can be effective within audio-masking-based speaker separation to improve the quality and intelligibility of the target speech. Two visual-only methods of generating an audio mask for speaker separation are first developed. These use a deep neural network to map the visual speech features to an audio feature space from which both visually derived binary masks and visually derived ratio masks are estimated, before application to the speech mixture. Second, an audio ratio masking method forms a baseline approach for speaker separation which is extended to exploit visual speech information to form audio-visual ratio masks. Speech quality and intelligibility tests are carried out on the visual-only, audio-only, and audio-visual masking methods of speaker separation at mixing levels from - 10 to +10 dB. These reveal substantial improvements in the target speech when applying the visual-only and audio-only masks, but with highest performance occurring when combining audio and visual information to create the audio-visual masks.
引用
收藏
页码:1742 / 1754
页数:13
相关论文
共 50 条
  • [41] Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis
    Debnath, Saswati
    Roy, Pinki
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 121 - 133
  • [42] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (02): : 117 - 128
  • [43] Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?
    Alm, Magnus
    Behne, Dawn
    FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [44] On Learning Spectral Masking for Single Channel Speech Enhancement Using Feedforward and Recurrent Neural Networks
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Al-Hasan, Muath
    Qazi, Abdul Baseer
    IEEE ACCESS, 2020, 8 : 160581 - 160595
  • [45] Speech Separation Using Deep Learning
    Nandal, P.
    SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 319 - 326
  • [46] The Effect of Spatial Separation of Sound Masking and Distracting Speech Sounds on Working Memory Performance and Annoyance
    Renz, Tobias
    Leistner, Philip
    Liebl, Andreas
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (04) : 611 - 622
  • [47] Audio-Visual Cross-Attention Network for Robotic Speaker Tracking
    Qian, Xinyuan
    Wang, Zhengdong
    Wang, Jiadong
    Guan, Guohui
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 550 - 562
  • [48] A new feature set for masking-based monaural speech separation
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 828 - 832
  • [49] Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception
    Ceolini, Enea
    Hjortkjaer, Jens
    De Wong, Daniel
    O'Sullivan, James
    Raghavan, Vinay S.
    Herrero, Jose
    Mehta, Ashesh D.
    Liu, Shih-Chii
    Mesgarani, Nima
    NEUROIMAGE, 2020, 223
  • [50] Dynamic Stream Weight Estimation in Coupled-HMM-based Audio-visual Speech Recognition Using Multilayer Perceptrons
    Abdelaziz, Ahmed Hussen
    Kolossa, Dorothea
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1144 - 1148