Paragraph Vector Based Topic Model for Language Model Adaptation

被引:0
|
作者
Jin, Wengong [1 ]
He, Tianxing [1 ]
Qian, Yanmin [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
language model adaptation; representation learning; topic model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Topic model is an important approach for language model (LM) adaptation and has attracted research interest for a long time. Latent Dirichlet Allocation (LDA), which assumes generative Dirichlet distribution with bag-of-word features for hidden topics, has been widely used as the state-of-the-art topic model. Inspired by recent development of a new paradigm of distributed paragraph representation called paragraph vector, a new topic model based on paragraph vector is proposed in this work. During training, each paragraph is mapped to a unique vector in continuous space. Then unsupervised clustering is performed to construct topic clusters. Topic-specific LM is then built based on clustering results. During adaptation, topic posterior is first estimated using the paragraph vector based topic model and new adapted LMs are constructed by interpolating the existing topic-specific models using topic posteriors. The proposed topic model is applied for N-gram LM adaptation and evaluated on Amazon Product Review Corpus for perplexity and a Chinese LVCSR task for CER evaluation. Results show that the proposed approach yields 11.1% relative perplexity reduction and 1.4% relative CER reduction over N-gram baseline, outperforming LDA based method proposed by previous work.
引用
收藏
页码:3516 / 3520
页数:5
相关论文
共 50 条
  • [1] An unsupervised Web-based topic language model adaptation method
    Lecorve, Gwenole
    Gravier, Guillaume
    Sebillot, Pascale
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5081 - 5084
  • [2] Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation
    Jeon, Hyung-Bae
    Lee, Soo-Young
    ETRI JOURNAL, 2016, 38 (03) : 487 - 493
  • [3] Learning Latent Topic Information for Language Model Adaptation
    Lu, Shixiang
    Wei, Wei
    Fu, Xiaoyin
    Fan, Lichun
    Xu, Bo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 143 - 153
  • [4] Leveraging Social Annotation for Topic Language Model Adaptation
    Wu, Youzheng
    Abe, Kazuhiko
    Dixon, Paul
    Hori, Chiori
    Kashioka, Hideki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 190 - 193
  • [5] Unsupervised Language Model Adaptation Based on Topic and Role Information in Multiparty Meetings
    Huang, Songfang
    Renals, Steve
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 833 - 836
  • [6] PLSA-based Topic Detection in Meetings for Adaptation of Lexicon and Language Model
    Akita, Yuya
    Nemoto, Yusuke
    Kawahara, Tatsuya
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1321 - 1324
  • [7] Paragraph Vector Based Retrieval Model for Similar Cases Recommendation
    Zhao, Yifei
    Wang, Jing
    Wang, Fei-Yue
    Shi, Xiaobo
    Lv, Yisheng
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 2220 - 2225
  • [8] Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval
    Ai, Qingyao
    Yang, Liu
    Guo, Jiafeng
    Croft, W. Bruce
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 869 - 872
  • [9] Robust topic inference for latent semantic language model adaptation
    Heidel, Aaron
    Lee, Lin-shan
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 177 - 182
  • [10] Language model adaptation through topic decomposition and MDI estimation
    Federico, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 773 - 776