Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities

被引:29
作者
Siatras, Spyridon [1 ]
Nikolaidis, Nikos [1 ]
Krinidis, Michail [1 ]
Pitas, Ioannis [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
关键词
Speaker detection; visual speech detection; SPEECH; FEATURES;
D O I
10.1109/TCSVT.2008.2009262
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, we introduce a novel approach for lip activity detection and speaker detection, using solely visual information. The main idea in this work is to apply signal detection algorithms to a simple and easily extracted feature from the mouth region. We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting visual speech. We then proceed in deriving a statistical algorithm that utilizes this fact for the efficient characterization of visual speech and silence In video sequences. Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment.
引用
收藏
页码:133 / 137
页数:5
相关论文
共 50 条
  • [41] Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues
    Mittal, Trisha
    Bhattacharya, Uttaran
    Chandra, Rohan
    Bera, Aniket
    Manocha, Dinesh
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2823 - 2832
  • [42] Abnormal human activity detection by convolutional recurrent neural network using fuzzy logic
    Kumar, Manoj
    Biswas, Mantosh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (22) : 61843 - 61859
  • [43] A voice activity detection algorithm in spectro-temporal domain using sparse representation
    Eshaghi, Mohadese
    Razzazi, Farbod
    Behrad, Alireza
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1791 - 1803
  • [44] Blind Spatial Sound Source Clustering and Activity Detection Using Uncalibrated Microphone Array
    Nakamura, Keisuke
    Mizumoto, Takeshi
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2438 - 2442
  • [45] Utilizing gammatone filter coefficient to improve human mouth-click signal detection using a multi-phase correlation process
    Saleh, Nur Luqman
    Sali, Aduwati
    Abdullah, Raja Syamsul Azmir Raja
    Ahmad, Sharifah M. Syed
    Liew, Jiun Terng
    Hashim, Fazirulhisyam
    Abdullah, Fairuz
    Rashid, Nur Emileen Abdul
    MEASUREMENT, 2024, 224
  • [46] fMRI STUDY OF GRADUATED EMOTIONAL CHARGE FOR DETECTION OF COVERT ACTIVITY USING PASSIVE LISTENING TO NARRATIVES
    Sontheimer, Anna
    Vassal, Francois
    Jean, Betty
    Feschet, Fabien
    Lubrano, Vincent
    Lemaire, Jean-Jacques
    NEUROSCIENCE, 2017, 349 : 291 - 302
  • [47] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
    Hou, Yuanbo
    Yu, Zhesong
    Liang, Xia
    Du, Xingjian
    Zhu, Bilei
    Ma, Zejun
    Botteldooren, Dick
    INTERSPEECH 2021, 2021, : 321 - 325
  • [48] Detection of glottal closure instant and glottal open region from speech signals using spectral flatness measure
    Kadiri, Sudarsana Reddy
    Prasad, RaviShankar
    Yegnanarayana, B.
    SPEECH COMMUNICATION, 2020, 116 : 30 - 43
  • [49] Presentation of a Segmentation Method for a Diabetic Retinopathy Patient's Fundus Region Detection Using a Convolutional Neural Network
    Valizadeh, Amin
    Ghoushchi, Saeid Jafarzadeh
    Ranjbarzadeh, Ramin
    Pourasad, Yaghoub
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [50] Visual Saliency Using Binary Spectrum of Walsh-Hadamard Transform and Its Applications to Ship Detection in Multispectral Imagery
    Yu, Ying
    Yang, Jian
    NEURAL PROCESSING LETTERS, 2017, 45 (03) : 759 - 776