Improving Real-time Recognition of Morphologically Rich Speech with Transformer Language Model

被引：0

作者：

Tarjan, Balazs ^{[1
,2
]}

Szaszak, Gyorgy ^{[1
]}

Fegyo, Tibor ^{[1
,2
]}

Mihajlik, Peter ^{[1
,3
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary

[2] SpeechTex Ltd, Budapest, Hungary

[3] THINKTech Res Ctr, Vac, Hungary

来源：

2020 11TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2020) | 2020年

关键词：

ASR; Transformer; data augmentation; subword unit; neural text generation; conversational speech; call center conversations; morphologically rich language;

D O I：

10.1109/coginfocom50765.2020.9237817

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Transformer models have become to state-of-the-art in natural language understanding, their use for language modeling in Automatic Speech Recognition (ASR) is also promising. Albeit Transformer based language models were shown to improve ASR performance, their computational complexity makes their application in real-time systems quite challenging. It has been also shown that the knowledge of such language models can be transferred to traditional n-gram models, suitable for real-time decoding. This paper investigates the adaptation of this transfer approach to morphologically rich languages, and in a real time scenario. We propose a new method for subword-based neural text augmentation with a Transformer language model, which consists in retokenizing the training corpus into subwords, based on a statistical data-driven approach. We demonstrate that ASR performance can be augmented by yet reducing the vocabulary size and alleviating memory consumption.

引用

页码：491 / 495

页数：5

共 33 条

[1]

Adel H, 2014, INTERSPEECH, P651

[2]

[Anonymous], 2011, IEEE WORKSHOP AUTOMA

[3] Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition [J].

Arisoy, Ebru ;

Chen, Stanley F. ;

Ramabhadran, Bhuvana ;

Sethy, Abhinav .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) :184-192

[4]

Baranyi P., 2010, Proceedings 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI 2010), P141, DOI 10.1109/CINTI.2010.5672257

[5]

Baranyi Peter., 2015, Cognitive Infocommunications

[6] An empirical study of smoothing techniques for language modeling [J].

Chen, SF ;

Goodman, J .

COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) :359-394

[7]

Creutz Mathias, 2002, P ACL 02 WORKSHOP MO, P21, DOI DOI 10.3115/1118647.1118650

[8]

Deoras A, 2011, INT CONF ACOUST SPEE, P5532

[9] Language Modeling with Deep Transformers [J].

Irie, Kazuki ;

Zeyer, Albert ;

Schlueter, Ralf ;

Ney, Hermann .

INTERSPEECH 2019, 2019, :3905-3909

[10]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

← 1 2 3 4 →