Development of vad evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition perfornlance

被引:0
|
作者
Kitaoka, Norihide [1 ]
Yamamoto, Kazumasa [2 ]
Kusamizu, Tomohiro [2 ]
Nakagawa, Seiichi [2 ]
Yamada, Takeshi [3 ]
Tsuge, Satoru [4 ]
Miyajima, Chiyomi [1 ]
Nishiura, Takanobu [5 ]
Nakayama, Masato [5 ]
Denda, Yuki [5 ]
Fujimoto, Masakiyo [6 ]
Takiguchi, Tetsuya [7 ]
Tamura, Satoshi [8 ]
Kuroiwa, Shingo [4 ]
Takeda, Kazuya [1 ]
Nakamura, Satoshi [9 ]
机构
[1] Nagoya Univ, Nagoya, Aichi 4648601, Japan
[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
[3] Univ Tsukuba, Tsukuba, Ibaraki, Japan
[4] Univ Tokushima, Tokushima 770, Japan
[5] Ritsumeikan Univ, Kyoto, Japan
[6] Nippon Telegraph & Tel Corp, Tokyo, Japan
[7] Kobe Univ, Kobe, Hyogo 657, Japan
[8] Gifu Univ, Gifu, Japan
[9] NICT, ATR, Jaipur, Rajasthan, India
来源
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 | 2007年
关键词
voice activity detection; noisy speech recognition; evaluation framework;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called Corpus and Environment for Noisy Speech Recognition 1 Concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.
引用
收藏
页码:607 / +
页数:2
相关论文
共 1 条
  • [1] CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments
    Kitaoka, Norihide
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Yamamoto, Kazumasa
    Nishiura, Takanobu
    Nakayama, Masato
    Denda, Yuki
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Matsuda, Shigeki
    Ogawa, Tetsuji
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2009, 30 (05) : 363 - 371