Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding

被引:0
|
作者
Nakamura, S [1 ]
Hiyane, K [1 ]
Asano, F [1 ]
Kaneda, Y [1 ]
Yamada, T [1 ]
Nishiura, T [1 ]
Kobayashi, T [1 ]
Ise, S [1 ]
Saruwatari, H [1 ]
机构
[1] ATR Spoken Language Translat Labs, Kyoto 6190288, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The sound data for open evaluation is necessary for the studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project aiming the acoustic data collection. There are many kinds, of sound scenes in real environments. The sound scene is specified by sound sources and room acoustics. The number of combination of the sound sources, source positions and rooms is huge in real acoustic environments. We assumed that the sound in the environments can be simulated by convolution of the isolated sound sources and impulse responses. As an isolated sound source, hundred kinds of environment sounds and speech sounds are collected. The impulse responses are collected in various acoustic environments. Additionally we collected sounds from the moving source. In this paper, progress of our sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.
引用
收藏
页码:A161 / A164
页数:4
相关论文
共 50 条
  • [1] HANDS-FREE SPEECH-SOUND INTERACTIONS AT HOME
    Milhorat, P.
    Istrate, D.
    Boudy, J.
    Chollet, G.
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1678 - 1682
  • [2] Environmental conditions and acoustic transduction in hands-free speech recognition
    Omologo, M
    Svaizer, P
    Matassoni, M
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 75 - 95
  • [3] Fast dereverberation for hands-free speech recognition
    Gomez, Randy
    Even, Jani
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 141 - +
  • [4] Training of HMM with filtered speech material for hands-free recognition
    ITC-IRST, Trento, Italy
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (449-452):
  • [5] Training of HMM with filtered speech material for hands-free recognition
    Giuliani, D
    Matassoni, M
    Omologo, M
    Svaizer, P
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 449 - 452
  • [6] Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition
    Delcroix, Marc
    Yoshioka, Takuya
    Ogawa, Atsunori
    Kubo, Yotaro
    Fujimoto, Masakiyo
    Ito, Nobutaka
    Kinoshita, Keisuke
    Espi, Miquel
    Araki, Shoko
    Hori, Takaaki
    Nakatani, Tomohiro
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 522 - 526
  • [7] Sound scene data collection in real acoustical environments
    Nakamura, Satoshi
    Hiyane, Kazuo
    Asano, Futoshi
    Endo, Takashi
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 225 - 232
  • [8] Experiments of in-car audio compensation for hands-free speech recognition
    Matassoni, M
    Omologo, M
    Zieger, C
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 369 - 374
  • [9] IMPROVED HANDS-FREE AUTOMATIC SPEECH RECOGNITION IN REVERBERANT ENVIRONMENT CONDITION
    Gomez, Randy
    Nakamura, Keisuke
    Mizumoto, Takeshi
    Nakadai, Kazuhiro
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 67 - 71
  • [10] Likelihood-maximizing beamforming for robust hands-free speech recognition
    Seltzer, ML
    Raj, B
    Stern, RM
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (05): : 489 - 498