Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities

被引：29

作者：

Siatras, Spyridon ^{[1
]}

Nikolaidis, Nikos ^{[1
]}

Krinidis, Michail ^{[1
]}

Pitas, Ioannis ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2009年 / 19卷 / 01期

关键词：

Speaker detection; visual speech detection; SPEECH; FEATURES;

D O I：

10.1109/TCSVT.2008.2009262

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, we introduce a novel approach for lip activity detection and speaker detection, using solely visual information. The main idea in this work is to apply signal detection algorithms to a simple and easily extracted feature from the mouth region. We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting visual speech. We then proceed in deriving a statistical algorithm that utilizes this fact for the efficient characterization of visual speech and silence In video sequences. Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment.

引用

页码：133 / 137

页数：5

共 50 条

[1] Current speaker detection system using lip motion information
Kwon, HB
Song, YJ
Chang, UD
Ahn, JH
IMAGE PROCESSING: ALGORITHMS AND SYSTEMS IV, 2005, 5672 : 370 - 377
[2] A study of voice activity detection techniques for NIST speaker recognition evaluations
Mak, Man-Wai
Yu, Hon-Bill
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01) : 295 - 313
[3] Target Active Speaker Detection with Audio-visual Cues
Jiang, Yidi
Tao, Ruijie
Pan, Zexu
Li, Haizhou
INTERSPEECH 2023, 2023, : 3152 - 3156
[4] Whisper activity detection using CNN-LSTM based attention pooling network trained for a speaker identification task
Naini, Abinay Reddy
Satyapriya, Malla
Ghosh, Prasanta Kumar
INTERSPEECH 2020, 2020, : 2922 - 2926
[5] Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks
Alnuaim, Abeer Ali
Zakariah, Mohammed
Alhadlaq, Aseel
Shashidhar, Chitra
Hatamleh, Wesam Atef
Tarazi, Hussam
Shukla, Prashant Kumar
Ratna, Rajnish
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[6] Speaker Detection Using Phoneme Specific Hidden Markov Models
Pakoci, Edvin
Jakovljevic, Niksa
Popovic, Branislav
Miskovic, Dragisa
Pekar, Darko
SPEECH AND COMPUTER, 2014, 8773 : 410 - 417
[7] Human Mouth State Detection Using Low Frequency Ultrasound
Ahmadi, Farzaneh
Ahmadi, Mousa
McLoughlin, Ian
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1805 - 1809
[8] Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
He, Maokui
Raj, Desh
Huang, Zili
Du, Jun
Chen, Zhuo
Watanabe, Shinji
INTERSPEECH 2021, 2021, : 3555 - 3559
[9] Visual saliency detection using information divergence
Hou, Weilong
Gao, Xinbo
Tao, Dacheng
Li, Xuelong
PATTERN RECOGNITION, 2013, 46 (10) : 2658 - 2669
[10] VOICE ACTIVITY DETECTION USING NEUROGRAMS
Jassim, Wissam A.
Harte, Naomi
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5524 - 5528

← 1 2 3 4 5 →