Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection

被引:14
作者
Tada, Yuuki [1 ]
Hagiwara, Yoshinobu [1 ]
Tanaka, Hiroki [1 ]
Taniguchi, Tadahiro [1 ]
机构
[1] Ritsumeikan Univ, Emergent Syst Lab, Coll Informat Sci & Engn, Kusatsu, Shiga, Japan
来源
FRONTIERS IN ROBOTICS AND AI | 2020年 / 6卷
关键词
language understanding; service robot; speech recognition; semantic parsing; robot-directed speech detection; RECOGNITION;
D O I
10.3389/frobt.2019.00144
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in the area of service robotics is modeled as a mapping of speech signals to a sequence of commands that can be understood and performed by a robot. In a conventional approach, speech signals are recognized, and semantic parsing is applied to infer the command sequence from the utterance. However, if errors occur during the process of speech recognition, a conventional semantic parsing method cannot be appropriately applied because most natural language processing methods do not recognize such errors. We propose the use of encoder-decoder neural networks, e.g., sequence to sequence, with noise injection. The noise is injected into phoneme sequences during the training phase of encoder-decoder neural network-based semantic parsing systems. We demonstrate that the use of neural networks with a noise injection can mitigate the negative effects of speech recognition errors in understanding robot-directed speech commands i.e., increase the performance of semantic parsing. We implemented the method and evaluated it using the commands given during a general purpose service robot (GPSR) task, such as a task applied in RoboCup@Home, which is a standard service robot competition for the testing of service robots. The results of the experiment show that the proposed method, namely, sequence to sequence with noise injection (Seq2Seq-NI), outperforms the baseline methods. In addition, Seq2Seq-NI enables a robot to understand a spoken command even when the speech recognition by an off-the-shelf ASR system contains recognition errors. Moreover, in this paper we describe an experiment conducted to evaluate the influence of the injected noise and provide a discussion of the results.
引用
收藏
页数:12
相关论文
共 39 条
  • [1] Amodei D, 2016, PR MACH LEARN RES, V48
  • [2] [Anonymous], INTERSPEECH
  • [3] [Anonymous], 2013, CEUR WORKSHOP PROC
  • [4] [Anonymous], P INT WORKSH SEM EV
  • [5] [Anonymous], 2013, 2013 AAAI SPRING S S
  • [6] [Anonymous], P INT WORKSH SEM EV
  • [7] [Anonymous], EUR S ART NEUR NETW
  • [8] [Anonymous], 2016, PROC COLING INT C CO
  • [9] [Anonymous], CONNECTION SCI
  • [10] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]