Low Frequency Ultrasonic Voice Activity Detection using Convolutional Neural Networks

被引:0
作者
McLoughlin, Ian [1 ,2 ]
Song, Yan [2 ]
机构
[1] Univ Kent, Sch Comp Sci, Rochester, Kent, England
[2] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
Voice activity detection; speech activity detection; ultrasonic speech; SaVAD;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Low frequency ultrasonic mouth state detection uses reflected audio chirps from the face in the region of the mouth to determine lip state, whether open, closed or partially open. The chirps are located in a frequency range just above the threshold of human hearing and are thus both inaudible as well as unaffected by interfering speech, yet can be produced and sensed using inexpensive equipment. To determine mouth open or closed state, and hence form a measure of voice activity detection, this recently invented technique relies upon the difference in the reflected chirp caused by resonances introduced by the open or partially open mouth cavity. Voice activity is then inferred from lip state through patterns of mouth movement, in a similar way to video-based lip-reading technologies. This paper introduces a new metric based on spectrogram features extracted from the reflected chirp, with a convolutional neural network classification back-end, that yields excellent performance without needing the periodic resetting of the template closed-mouth reflection required by the original technique.
引用
收藏
页码:2400 / 2404
页数:5
相关论文
共 13 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]   Linear predictive analysis for ultrasonic speech [J].
Ahmadi, F. ;
McLoughlin, I. V. ;
Sharifzadeh, H. R. .
ELECTRONICS LETTERS, 2010, 46 (06) :387-U16
[3]  
Ahmadi F, 2013, INTERSPEECH, P1805
[4]  
Ahmadi F, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1616
[5]  
[Anonymous], 2006, NOTES CONVOLUTIONAL
[6]  
[Anonymous], HDB BRAIN THEORY NEU
[7]  
[Anonymous], 2012, Prediction as a candidate for learning deep hierarchical models of data
[8]   A robust voice activity detector for wireless communications using soft computing [J].
Beritelli, F ;
Casale, S ;
Cavallaro, A .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1998, 16 (09) :1818-1829
[9]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[10]   Super-Audible Voice Activity Detection [J].
McLoughlin, Ian Vince .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (09) :1424-1433