Rich Morphology Based N-gram Language Models for Arabic

被引:0
|
作者
Emami, Ahmad [1 ]
Zitouni, Imed [1 ]
Mangu, Lidia [1 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Language Modeling; Arabic Morphology; Rich Language Modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of rich morphology such as word segmentation, part-of-speech tagging and diacritic restoration to improve Arabic language modeling. We enrich the context by performing morphological analysis on the word history. We use neural network models to integrate this additional information, due to their ability to handle long and enriched dependencies. We experimented with models with increasing order of morphological features, starting with Arabic segmentation, and later adding part of speech labels as well as words with restored diacritics. Experiments on Arabic broadcast news and broadcast conversations data showed significant improvements in perplexity, reducing the baseline N-gram and the neural network N-gram model perplexities by 35% and 31% respectively.
引用
收藏
页码:829 / 832
页数:4
相关论文
共 50 条
  • [1] Profile based compression of n-gram language models
    Olsen, Jesper
    Oria, Daniela
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1041 - 1044
  • [2] On compressing n-gram language models
    Hirsimaki, Teemu
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
  • [3] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
    Sak, Hasim
    Allauzen, Cyril
    Nakajima, Kaisuke
    Beaufays, Francoise
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
  • [4] Perplexity of n-Gram and Dependency Language Models
    Popel, Martin
    Marecek, David
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 173 - 180
  • [5] Improved N-gram Phonotactic Models For Language Recognition
    BenZeghiba, Mohamed Faouzi
    Gauvain, Jean-Luc
    Lamel, Lori
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2718 - 2721
  • [6] Efficient MDI Adaptation for n-gram Language Models
    Huang, Ruizhe
    Li, Ke
    Arora, Ashish
    Povey, Daniel
    Khudanpur, Sanjeev
    INTERSPEECH 2020, 2020, : 4916 - 4920
  • [7] N-gram language models for massively parallel devices
    Bogoychev, Nikolay
    Lopez, Adam
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1944 - 1953
  • [8] Constrained Discriminative Training of N-gram Language Models
    Rastrow, Ariya
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 311 - +
  • [9] Multilingual stochastic n-gram class language models
    Jardino, M
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 161 - 163
  • [10] POWER LAW DISCOUNTING FOR N-GRAM LANGUAGE MODELS
    Huang, Songfang
    Renals, Steve
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5178 - 5181