Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities

被引：29

作者：

Siatras, Spyridon ^{[1
]}

Nikolaidis, Nikos ^{[1
]}

Krinidis, Michail ^{[1
]}

Pitas, Ioannis ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2009年 / 19卷 / 01期

关键词：

Speaker detection; visual speech detection; SPEECH; FEATURES;

D O I：

10.1109/TCSVT.2008.2009262

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, we introduce a novel approach for lip activity detection and speaker detection, using solely visual information. The main idea in this work is to apply signal detection algorithms to a simple and easily extracted feature from the mouth region. We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting visual speech. We then proceed in deriving a statistical algorithm that utilizes this fact for the efficient characterization of visual speech and silence In video sequences. Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment.

引用

页码：133 / 137

页数：5

共 50 条

[41] Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues
Mittal, Trisha
Bhattacharya, Uttaran
Chandra, Rohan
Bera, Aniket
Manocha, Dinesh
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2823 - 2832
[42] Abnormal human activity detection by convolutional recurrent neural network using fuzzy logic
Kumar, Manoj
Biswas, Mantosh
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (22) : 61843 - 61859
[43] A voice activity detection algorithm in spectro-temporal domain using sparse representation
Eshaghi, Mohadese
Razzazi, Farbod
Behrad, Alireza
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1791 - 1803
[44] Blind Spatial Sound Source Clustering and Activity Detection Using Uncalibrated Microphone Array
Nakamura, Keisuke
Mizumoto, Takeshi
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2438 - 2442
[45] Utilizing gammatone filter coefficient to improve human mouth-click signal detection using a multi-phase correlation process
Saleh, Nur Luqman
Sali, Aduwati
Abdullah, Raja Syamsul Azmir Raja
Ahmad, Sharifah M. Syed
Liew, Jiun Terng
Hashim, Fazirulhisyam
Abdullah, Fairuz
Rashid, Nur Emileen Abdul
MEASUREMENT, 2024, 224
[46] fMRI STUDY OF GRADUATED EMOTIONAL CHARGE FOR DETECTION OF COVERT ACTIVITY USING PASSIVE LISTENING TO NARRATIVES
Sontheimer, Anna
Vassal, Francois
Jean, Betty
Feschet, Fabien
Lubrano, Vincent
Lemaire, Jean-Jacques
NEUROSCIENCE, 2017, 349 : 291 - 302
[47] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Hou, Yuanbo
Yu, Zhesong
Liang, Xia
Du, Xingjian
Zhu, Bilei
Ma, Zejun
Botteldooren, Dick
INTERSPEECH 2021, 2021, : 321 - 325
[48] Detection of glottal closure instant and glottal open region from speech signals using spectral flatness measure
Kadiri, Sudarsana Reddy
Prasad, RaviShankar
Yegnanarayana, B.
SPEECH COMMUNICATION, 2020, 116 : 30 - 43
[49] Presentation of a Segmentation Method for a Diabetic Retinopathy Patient's Fundus Region Detection Using a Convolutional Neural Network
Valizadeh, Amin
Ghoushchi, Saeid Jafarzadeh
Ranjbarzadeh, Ramin
Pourasad, Yaghoub
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[50] Visual Saliency Using Binary Spectrum of Walsh-Hadamard Transform and Its Applications to Ship Detection in Multispectral Imagery
Yu, Ying
Yang, Jian
NEURAL PROCESSING LETTERS, 2017, 45 (03) : 759 - 776

← 1 2 3 4 5 →