Online Incremental Learning for Speaker-Adaptive Language Models

被引：0

作者：

Hu, Chih Chi ^{[1
]}

Liu, Bing ^{[1
]}

Shen, John Paul ^{[1
]}

Lane, Ian ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Elect & Comp Engn, Pittsburgh, PA 15213 USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Automatic Speech Recognition; Online Learning; Language Modeling; Speaker-Adaptation; Speaker Specific Modeling; Recurrent Neural Networks; ADAPTATION;

D O I：

10.21437/Interspeech.2018-2259

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Voice control is a prominent interaction method on personal computing devices. While automatic speech recognition (ASR) systems are readily applicable for large audiences, there is room for further adaptation at the edge, ie. locally on devices, targeted for individual users. In this work, we explore improving ASR systems over time through a user's own interactions. Our online learning approach for speaker-adaptive language modeling leverages a user's most recent utterances to enhance the speaker dependent features and traits. We experiment with the Large Vocabulary Continuous Speech Recognition corpus Tedlium v2, and demonstrate an average reduction in perplexity (PPL) of 19.18% and average relative reduction in word error rate (WER) of 2.80% compared to a state-of-the-art baseline on Tedlium v2.

引用

页码：3363 / 3367

页数：5

共 50 条

[21] Adaptive online incremental learning for evolving data streams
Zhang, Si -si
Liu, Jian-wei
Zuo, Xin
APPLIED SOFT COMPUTING, 2021, 105
[22] Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model
Koshinaka, Takafumi
Nagatomo, Kentaro
Shinoda, Koichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2469 - 2478
[23] Speaker Independent Speech Recognition Implementation with Adaptive Language Models
Anukriti
Tiwari, Sushant
Chatterjee, Tanmay
Bhattacharya, Mahua
2013 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2013, : 7 - 10
[24] ONLINE SPEAKER CLUSTERING USING INCREMENTAL LEARNING OF AN ERGODIC HIDDEN MARKOV MODEL
Koshinaka, Takafumi
Nagatomo, Kentaro
Shinoda, Koichi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4093 - +
[25] Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
Hattori, Nobuhiko
Toda, Tomoki
Kawai, Hisashi
Saruwatari, Hiroshi
Shikano, Kiyohiro
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2780 - +
[26] Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
Yamagishi, Junichi
Watts, Oliver
King, Simon
Usabaev, Bela
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 418 - +
[27] A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
Ninh, Duy Khanh
PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 342 - 346
[28] Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
Yamagishi, Junichi
Nose, Takashi
Zen, Heiga
Ling, Zhen-Hua
Toda, Tomoki
Tokuda, Keiichi
King, Simon
Renals, Steve
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1208 - 1230
[29] MSVQ-based speaker-adaptive Chinese syllable recognition based on discriminative training
Zhou, L
Imai, S
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 1997, 11 (07) : 569 - 583
[30] Adaptive Neural Networks for Online Domain Incremental Continual Learning
Gunasekara, Nuwan
Gomes, Heitor
Bifet, Albert
Pfahringer, Bernhard
DISCOVERY SCIENCE (DS 2022), 2022, 13601 : 89 - 103

← 1 2 3 4 5 →