Information content measures of semantic similarity between documents based on Hadoop system

被引:0
|
作者
Birjali, Marouane [1 ]
Beni-Hssane, Abderrahim [1 ]
Erritali, Mohammed [2 ]
Madani, Youness [2 ]
机构
[1] Univ Chouaib Doukkali, Fac Sci, Dept Comp Sci, El Jadida, Morocco
[2] Univ Sultan Moulay Slimane, Fac Sci & Technol, Dept Comp Sci, Beni Mellal, Morocco
来源
2016 INTERNATIONAL CONFERENCE ON WIRELESS NETWORKS AND MOBILE COMMUNICATIONS (WINCOM) | 2016年
关键词
distributed processing; Hadoop; Big Data; Semantic similarity; Mapreduce programming; Wordnet;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieving documents in response to the user's query is the most commonly text retrieval task. For our work, we have mainly focused on detecting the semantic similarity between documents in large documents collection and queries. In this paper, we investigated MapReduce as a specific framework for managing distributed processing in dataset pattern and semantic similarity measures of documents. Then we study the state of the art of different approaches for computing the semantic similarity of documents. We propose an approach based on parallel algorithm of semantic similarity measures using MapReduce and WordNet to detect the relevant documents in the face of the query. Finally, we are leading basic experiments to assess the performance of the proposed approach and noted the leverage of Hadoop and MapReduce to the semantic similarity measures between documents.
引用
收藏
页码:P187 / P192
页数:6
相关论文
共 50 条
  • [41] Classifying XML documents based on Structure/Content similarity
    Xing, Guangming
    Guo, Jinhua
    Xia, Zhonghang
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 444 - 457
  • [42] Unsupervised Semantic Similarity Computation between Terms Using Web Documents
    Iosif, Elias
    Potamianos, Alexandros
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (11) : 1637 - 1647
  • [43] Ontology based Semantic Measures in Document Similarity Ranking
    Sridevi, U. K.
    Nagaveni, N.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 482 - +
  • [44] Image similarity measures based on weak semantic embedding
    Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200030, China
    Gaojishu Tongxin, 2006, 1 (27-31):
  • [45] An Ontology Based Approach to Measuring the Semantic Similarity between Information Objects in Personal Information Collections
    Shi, Lei
    Setchi, Rossitza
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I, 2010, 6276 : 617 - 626
  • [46] HDSW: Semantic sensor network system based on hadoop
    Zhang, Xiaoming, 1600, Science and Engineering Research Support Society (09):
  • [47] A peer-to-peer information retrieval system based on semantic similarity model
    Zhu, Kun-Peng
    Xu, Zhi-Ming
    Wang, Xiao-Long
    Zhao, Yu-Ming
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4038 - 4043
  • [48] Similarity measures for XML documents based on kernel matrix learning
    Institute of Computer Science and Technology, Peking University, Beijing 100871, China
    不详
    Ruan Jian Xue Bao, 2006, 5 (991-1000):
  • [49] Link-based similarity measures for the classification of Web documents
    Calado, P
    Cristo, M
    Gonçalves, MA
    de Moura, ES
    Ribeiro-Neto, B
    Ziviani, N
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (02): : 208 - 221
  • [50] A semantic similarity computation method for virtual resources in cloud manufacturing environment based on information content
    Zhang, Zhe
    Chen, Youling
    Wang, Xu
    JOURNAL OF MANUFACTURING SYSTEMS, 2021, 59 : 646 - 660