DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION

被引:0
|
作者
Zhao, Yong [1 ]
Li, Jinyu [1 ]
Zhang, Shixiong [1 ]
Chen, Liping [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA
关键词
deep neural network; domain adaptation; speaker adaptation; anchor embedding;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. Second, we directly update the existing model parameters for domain adaptation. We demonstrate that prior distribution should be updated along with the network adaptation to compensate the label bias from the development data. Updating the priors may have a significant impact when the target domain features high occurrence of anchor words. Experiments on Hey Cortana desktop test set show that both approaches improve the recognition accuracy significantly. The anchor-based adaptation using the anchor d-vector and the prior interpolation achieves 32% relative reduction in WER over the generic model.
引用
收藏
页码:5984 / 5988
页数:5
相关论文
共 50 条
  • [41] Speaker segmentation and adaptation for speech recognition on multiple-speaker audio conference data
    Liu, Zhu
    Saraclar, Murat
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 192 - +
  • [42] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition
    Kosaka, Tetsuo
    Takeda, Yuui
    Ito, Takashi
    Kato, Masaharu
    Kohda, Masaki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2363 - 2369
  • [43] Gender domain adaptation for automatic speech recognition
    Sokolov, Artem
    Savchenko, Anclrey V.
    2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021), 2021, : 413 - 417
  • [44] DOMAIN ADAPTATION FOR PARSING IN AUTOMATIC SPEECH RECOGNITION
    Marin, Alex
    Ostendorf, Mari
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] Smoothed N-best-based speaker adaptation for speech recognition
    Matsui, T
    Matsuoka, T
    Furui, S
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1015 - 1018
  • [46] Higher Accuracy of Hindi Speech Recognition Due to Online Speaker Adaptation
    Sivaraman, Ganesh
    Malta, Swapnil
    Nabar, Neeraj
    Samudravijaya, K.
    TECHNOLOGY SYSTEMS AND MANAGEMENT, 2011, 145 : 233 - +
  • [47] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
    Thelen, E
    Aubert, X
    Beyerlein, P
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
  • [48] N-Best-based unsupervised speaker adaptation for speech recognition
    Matsui, T
    Furui, S
    COMPUTER SPEECH AND LANGUAGE, 1998, 12 (01): : 41 - 50
  • [49] Fast speaker adaptation of artificial neural networks for automatic speech recognition
    Dupont, S
    Cheboub, L
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1795 - 1798
  • [50] Analysis on MAP and MLLR Based Speaker Adaptation Techniques in Speech Recognition
    Ramya, T.
    Christina, Lilly S.
    Vijayalakshmi, P.
    Nagarajan, T.
    2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1753 - 1758