Online Incremental Learning for Speaker-Adaptive Language Models

被引:0
|
作者
Hu, Chih Chi [1 ]
Liu, Bing [1 ]
Shen, John Paul [1 ]
Lane, Ian [1 ]
机构
[1] Carnegie Mellon Univ, Elect & Comp Engn, Pittsburgh, PA 15213 USA
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
Automatic Speech Recognition; Online Learning; Language Modeling; Speaker-Adaptation; Speaker Specific Modeling; Recurrent Neural Networks; ADAPTATION;
D O I
10.21437/Interspeech.2018-2259
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice control is a prominent interaction method on personal computing devices. While automatic speech recognition (ASR) systems are readily applicable for large audiences, there is room for further adaptation at the edge, ie. locally on devices, targeted for individual users. In this work, we explore improving ASR systems over time through a user's own interactions. Our online learning approach for speaker-adaptive language modeling leverages a user's most recent utterances to enhance the speaker dependent features and traits. We experiment with the Large Vocabulary Continuous Speech Recognition corpus Tedlium v2, and demonstrate an average reduction in perplexity (PPL) of 19.18% and average relative reduction in word error rate (WER) of 2.80% compared to a state-of-the-art baseline on Tedlium v2.
引用
收藏
页码:3363 / 3367
页数:5
相关论文
共 50 条
  • [21] Adaptive online incremental learning for evolving data streams
    Zhang, Si -si
    Liu, Jian-wei
    Zuo, Xin
    APPLIED SOFT COMPUTING, 2021, 105
  • [22] Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model
    Koshinaka, Takafumi
    Nagatomo, Kentaro
    Shinoda, Koichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2469 - 2478
  • [23] Speaker Independent Speech Recognition Implementation with Adaptive Language Models
    Anukriti
    Tiwari, Sushant
    Chatterjee, Tanmay
    Bhattacharya, Mahua
    2013 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2013, : 7 - 10
  • [24] ONLINE SPEAKER CLUSTERING USING INCREMENTAL LEARNING OF AN ERGODIC HIDDEN MARKOV MODEL
    Koshinaka, Takafumi
    Nagatomo, Kentaro
    Shinoda, Koichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4093 - +
  • [25] Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
    Hattori, Nobuhiko
    Toda, Tomoki
    Kawai, Hisashi
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2780 - +
  • [26] Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
    Yamagishi, Junichi
    Watts, Oliver
    King, Simon
    Usabaev, Bela
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 418 - +
  • [27] A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
    Ninh, Duy Khanh
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 342 - 346
  • [28] Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    Yamagishi, Junichi
    Nose, Takashi
    Zen, Heiga
    Ling, Zhen-Hua
    Toda, Tomoki
    Tokuda, Keiichi
    King, Simon
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1208 - 1230
  • [29] MSVQ-based speaker-adaptive Chinese syllable recognition based on discriminative training
    Zhou, L
    Imai, S
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 1997, 11 (07) : 569 - 583
  • [30] Adaptive Neural Networks for Online Domain Incremental Continual Learning
    Gunasekara, Nuwan
    Gomes, Heitor
    Bifet, Albert
    Pfahringer, Bernhard
    DISCOVERY SCIENCE (DS 2022), 2022, 13601 : 89 - 103