THE DIRHA-ENGLISH CORPUS AND RELATED TASKS FOR DISTANT-SPEECH RECOGNITION IN DOMESTIC ENVIRONMENTS

被引：0

作者：

Ravanelli, Mirco ^{[1
]}

Cristoforetti, Luca ^{[1
]}

Gretter, Roberto ^{[1
]}

Pellin, Marco ^{[1
]}

Sosi, Alessandro ^{[1
]}

Omologo, Maurizio ^{[1
]}

机构：

[1] Fdn Bruno Kessler, I-38123 Povo, Trento, Italy

来源：

2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2015年

关键词：

distant speech recognition; microphone arrays; corpora; Kaldi; DNN;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.

引用

页码：275 / 282

页数：8

共 34 条

[1]

[Anonymous], MICROPHONE ARRAYS

[2]

[Anonymous], 2008, HDB SIGNAL PROCESSIN

[3]

[Anonymous], 1 ANN C INT SPEECH C

[4]

[Anonymous], BLIND SPEECH SEPARAT

[5]

[Anonymous], 2009, Distant Speech Recognition

[6]

Barker J., 2015, P ASRU 2015

[7] The PASCAL CHiME speech separation and recognition challenge [J].

Barker, Jon ;

Vincent, Emmanuel ;

Ma, Ning ;

Christensen, Heidi ;

Green, Phil .

COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) :621-633

[8] AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS [J].

BRUGNARA, F ;

FALAVIGNA, D ;

OMOLOGO, M .

SPEECH COMMUNICATION, 1993, 12 (04) :357-370

[9]

Brutti A, 2014, 2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), P157, DOI 10.1109/HSCMA.2014.6843271

[10] An audio-visual corpus for speech perception and automatic speech recognition (L) [J].

Cooke, Martin ;

Barker, Jon ;

Cunningham, Stuart ;

Shao, Xu .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424

← 1 2 3 4 →