Voice Activity Detection Using Discriminative Restricted Boltzmann Machines

被引：0

作者：

Borin, Rogerio G. ^{[1
]}

Silva, Magno T. M. ^{[1
]}

机构：

[1] Univ Sao Paulo, Escola Politecn, Sao Paulo, Brazil

来源：

2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2017年

关键词：

HIGHER-ORDER STATISTICS; DEEP BELIEF NETWORKS; SPEECH; MODEL;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Voice Activity Detection (VAD) plays an important role in current technological applications, such as wireless communications and speech recognition. In this paper, we address the VAD task through machine learning by using a discriminative restricted Boltzmann machine (DRBM). We extend the conventional DRBM to deal with continuous-valued data and employ feature vectors based either on mel-frequency cepstral coefficients or on filter-bank energies. The resulting detector slightly outperforms the VAD often used as benchmark for detector comparison. Results also indicate that DRBM is able to deal with strongly correlated feature vectors.

引用

页码：523 / 527

页数：5

共 25 条

[1]

[Anonymous], 2005, APPENDIX 2 ITU T G 7

[2] ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].

Benyassine, A ;

Shlomot, E ;

Su, HY ;

Massaloux, D ;

Lamblin, C ;

Petit, JP .

IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73

[3] Voice activity detection based on multiple statistical models [J].

Chang, Joon-Hyuk ;

Kim, Nam Soo ;

Mitra, Sanjit K. .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (06) :1965-1976

[4]

Cho K, 2011, LECT NOTES COMPUT SC, V6791, P10, DOI 10.1007/978-3-642-21735-7_2

[5]

Dong EQ, 2002, 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, P1124, DOI 10.1109/ICOSP.2002.1179987

[6] A soft voice activity detector based on a Laplacian-Gaussian model [J].

Gazor, S ;

Zhang, W .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :498-505

[7]

HAIGH JA, 1993, TENCON'93: 1993 IEEE REGION 10 CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND POWER ENGINEERING, VOL 3, P321, DOI 10.1109/TENCON.1993.327987

[8] Training products of experts by minimizing contrastive divergence [J].

Hinton, GE .

NEURAL COMPUTATION, 2002, 14 (08) :1771-1800

[9] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[10] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

← 1 2 3 →