Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities

被引:29
作者
Siatras, Spyridon [1 ]
Nikolaidis, Nikos [1 ]
Krinidis, Michail [1 ]
Pitas, Ioannis [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
关键词
Speaker detection; visual speech detection; SPEECH; FEATURES;
D O I
10.1109/TCSVT.2008.2009262
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, we introduce a novel approach for lip activity detection and speaker detection, using solely visual information. The main idea in this work is to apply signal detection algorithms to a simple and easily extracted feature from the mouth region. We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting visual speech. We then proceed in deriving a statistical algorithm that utilizes this fact for the efficient characterization of visual speech and silence In video sequences. Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment.
引用
收藏
页码:133 / 137
页数:5
相关论文
共 50 条
  • [1] Current speaker detection system using lip motion information
    Kwon, HB
    Song, YJ
    Chang, UD
    Ahn, JH
    IMAGE PROCESSING: ALGORITHMS AND SYSTEMS IV, 2005, 5672 : 370 - 377
  • [2] A study of voice activity detection techniques for NIST speaker recognition evaluations
    Mak, Man-Wai
    Yu, Hon-Bill
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01) : 295 - 313
  • [3] Target Active Speaker Detection with Audio-visual Cues
    Jiang, Yidi
    Tao, Ruijie
    Pan, Zexu
    Li, Haizhou
    INTERSPEECH 2023, 2023, : 3152 - 3156
  • [4] Whisper activity detection using CNN-LSTM based attention pooling network trained for a speaker identification task
    Naini, Abinay Reddy
    Satyapriya, Malla
    Ghosh, Prasanta Kumar
    INTERSPEECH 2020, 2020, : 2922 - 2926
  • [5] Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks
    Alnuaim, Abeer Ali
    Zakariah, Mohammed
    Alhadlaq, Aseel
    Shashidhar, Chitra
    Hatamleh, Wesam Atef
    Tarazi, Hussam
    Shukla, Prashant Kumar
    Ratna, Rajnish
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [6] Speaker Detection Using Phoneme Specific Hidden Markov Models
    Pakoci, Edvin
    Jakovljevic, Niksa
    Popovic, Branislav
    Miskovic, Dragisa
    Pekar, Darko
    SPEECH AND COMPUTER, 2014, 8773 : 410 - 417
  • [7] Human Mouth State Detection Using Low Frequency Ultrasound
    Ahmadi, Farzaneh
    Ahmadi, Mousa
    McLoughlin, Ian
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1805 - 1809
  • [8] Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
    He, Maokui
    Raj, Desh
    Huang, Zili
    Du, Jun
    Chen, Zhuo
    Watanabe, Shinji
    INTERSPEECH 2021, 2021, : 3555 - 3559
  • [9] Visual saliency detection using information divergence
    Hou, Weilong
    Gao, Xinbo
    Tao, Dacheng
    Li, Xuelong
    PATTERN RECOGNITION, 2013, 46 (10) : 2658 - 2669
  • [10] VOICE ACTIVITY DETECTION USING NEUROGRAMS
    Jassim, Wissam A.
    Harte, Naomi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5524 - 5528