COMPARATIVE STUDY OF ARABIC AND FRENCH STATISTICAL LANGUAGE MODELS

被引:0
作者
Meftouh, Karima [1 ]
Smaili, Kamel [2 ]
Laskri, Mohamed Tayeb
机构
[1] Badji Mokhtar Univ, Dept Informat, Annaba, Algeria
[2] INRIA LORIA, F-54506 Vandoeuvre Les Nancy, France
来源
ICAART 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE | 2009年
关键词
Statistical language modeling; Arabic; French; Smoothing technique; n-gram model; Vocabulary; Perplexity; Performance; PROBABILITIES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a comparative study of statistical language models of Arabic and French. The objective of this study is to understand how to better model both Arabic and French. Several experiments using different smoothing techniques have been carried out. For French, trigram models are most appropriate whatever the smoothing technique used. For Arabic, the n-gram models of higher order smoothed with Witten Bell method are more efficient. Tests are achieved with comparable corpora and vocabularies in terms of size.
引用
收藏
页码:156 / +
页数:3
相关论文
共 12 条
[1]  
Al-Sulaiti L, 2004, THESIS
[2]  
[Anonymous], 1998, Tech. Rep. TR-10-98
[3]  
[Anonymous], 2002, PROCEEDINGSOF WORKSH
[4]  
Geetha, 2007, ACM T ASIAN LANGUAGE, V6, P9
[5]  
HAYDER K, 2005, IIT 05
[6]   ESTIMATION OF PROBABILITIES FROM SPARSE DATA FOR THE LANGUAGE MODEL COMPONENT OF A SPEECH RECOGNIZER [J].
KATZ, SM .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (03) :400-401
[7]  
KIM W, 2003, P 2003 C EMP METH NA, V10
[8]  
MEFTOUH K, 2008, JADT 08, P12
[9]   ON STRUCTURING PROBABILISTIC DEPENDENCES IN STOCHASTIC LANGUAGE MODELING [J].
NEY, H ;
ESSEN, U ;
KNESER, R .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (01) :1-38
[10]  
VERGYRI D, 2004, COLING WORKSH AR SCR