Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding

被引：0

作者：

Vukotic, Vedran ^{[1
]}

Raymond, Christian ^{[2
,3
]}

机构：

[1] LAMARK IMATAG, Rennes, France

[2] INSA Rennes, Rennes, France

[3] INRIA IRISA, Rennes, France

来源：

INTERSPEECH 2019 | 2019年

关键词：

spoken language understanding; recurrent neural networks; RNN; triplets; triplet loss; triplet mining; hard triplets; long short-term memory; LSTM; gated recurrent units; GRU; ATIS; SNIPS; MEDIA; deep learning;

D O I：

10.21437/Interspeech.2019-2977

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The typical RNN (Recurrent Neural Network) pipeline in SLU (Spoken Language Understanding), and specifically in the slot-filling task, consists of three stages: word embedding, context window representation, and label prediction. Label prediction, as a classification task, is the one that creates a sensible context window representation during learning through back-propagation. However, due to natural variations of the data, differences in two same-labeled samples can lead to dissimilar representations, whereas similarities in two differently-labeled samples can lead to them having close representations. In computer vision applications, specifically in face recognition and person re-identification, this problem has recently been successfully tackled by introducing data triplets and a triplet loss function. In SLU, each word can be mapped to one or multiple labels depending on small variations of its context. We exploit this fact to construct data triplets consisting of the same words with different contexts that form a pair of datapoints with matching target labels and an another pair with non-matching labels. By using these triplets and an additional loss function, we update the context window representation in order to improve it, make dissimilar samples more distant and similar samples closer, leading to better classification results and an improved rate of convergence.

引用

页码：1178 / 1182

页数：5

共 25 条

[1]

Bechet F., 2018, INTERSPEECH 2018, P1

[2]

Bonneau-Maynard H., 2005, INTERSPEECH

[3]

Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339

[4]

Cho K, 2014, ARXIV14061078

[5]

Chollet F., 2015, Keras

[6]

Coucke A., 2018, CoRR

[7]

Dahl Deborah A., 1994, HUMAN LANGUAGE TECHN

[8] Label-dependency coding in Simple Recurrent Networks for Spoken Language Understanding [J].

Dinarelli, Marco ;

Vukotic, Vedran ;

Raymond, Christian .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2491-2495

[9]

Ding W., 2018, ARXIV180309059

[10] Spoken Language Identification using LSTM-based Angular Proximity [J].

Gelly, G. ;

Gauvain, J. L. .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2566-2570

← 1 2 3 →