GENERATING SOUND WORDS FROM AUDIO SIGNALS OF ACOUSTIC EVENTS WITH SEQUENCE-TO-SEQUENCE MODEL

被引：0

作者：

Ikawa, Shota ^{[1
]}

Kashino, Kunio ^{[1
,2
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan

[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Sound word; onomatopoeia; sequence-to-sequence model; sound transcription;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Representing various sounds in language, such as sound words, or onomatopoeias, is not only useful as an auxiliary means for automatic speech recognition, but also essential in emerging fields such as natural human-machine communication, searching audio archives for acoustic events, and abnormality detection based on sounds. This paper proposes a novel method for sound word generation from audio signals. The method is based on an end-to-end, sequence-to-sequence framework to solve the audio segmentation problem to find an appropriate segment of audio signals along time that corresponds to a sequence of phonemes, and the ambiguity problem, where multiple words may correspond to the same sound, depending on the situations or listeners. Our tests show that the method worked efficiently and achieved a 2.8 % mean phoneme error rate (MPER) and a 7.2 % word error rate (WER) in a sound word generation task.

引用

页码：346 / 350

页数：5

共 18 条

[1] [Anonymous], P EUR
[2] [Anonymous], ICML 06
[3] [Anonymous], 2017 AUT M SEP
[4] [Anonymous], 2015, CORR
[5] [Anonymous], 2013, GENERATING SEQUENCES
[6] [Anonymous], 2014, CORR
[7] [Anonymous], P EUR
[8] [Anonymous], CORR
[9] [Anonymous], ACOUSTIC SOUND FIELD
[10] Comparison of techniques for environmental sound recognition
Cowling, M
Sitte, R
[J]. PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2895 - 2907

← 1 2 →