Semantically Enhanced Term Frequency based on Word Embeddings for Arabic Information Retrieval

被引:0
作者
El Mahdaouy, Abdelkader [1 ,2 ]
El Alaoui, Said Ouatik [1 ]
Gaussier, Eric [2 ]
机构
[1] Univ USMBA, FSDM, LIM, Fes, Morocco
[2] Univ Grenoble Alpes, CNRS, LIG, AMA, Grenoble, France
来源
2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST) | 2016年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional Information Retrieval (IR) models are based on bag-of-words paradigm, where relevance scores are computed based on exact matching of keywords. Although these models have already achieved good performance, it has been shown that most of dissatisfaction cases in relevance are due to term mismatch between queries and documents. In this paper, we introduce novel method to compute term frequency based on semantic similarities using distributed representations of words in a vector space (Word Embeddings). Our main goal is to allow distinct but semantically related terms to match each other and contribute to the relevance scores. Hence, Arabic documents are retrieved beyond the bag-of-words paradigm based on semantic similarities between word vectors. The results on Arabic standard TREC data sets show significant improvement over the baseline bag-of-words models.
引用
收藏
页码:385 / 389
页数:5
相关论文
共 21 条
  • [1] [Anonymous], 2015, NAACL HLT 2015 2015, DOI DOI 10.3115/V1/N15-1184
  • [2] [Anonymous], TREC 94
  • [3] [Anonymous], 2012, P 50 ANN M ASS COMPU
  • [4] [Anonymous], 2013, ICLR'13
  • [5] [Anonymous], TERMINOLOGY ARTIFICI
  • [6] Atwan J., 2015, J INFORM SCI
  • [7] boulaknadel Siham, 2008, 2008 IEEE Symposium on Computers and Communications (ISCC), P869, DOI 10.1109/ISCC.2008.4625661
  • [8] Clinchant S, 2010, SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, P234
  • [9] Retrieval constraints and word frequency distributions a log-logistic model for IR
    Clinchant, Stephane
    Gaussier, Eric
    [J]. INFORMATION RETRIEVAL, 2011, 14 (01): : 5 - 25
  • [10] El Mandaouy A, 2014, COLLOQ INF SCI TECH, P272, DOI 10.1109/CIST.2014.7016631