Design and Implement of Distributed Document Clustering Based on MapReduce

被引:0
作者
Wan, Jian [1 ]
Yu, Wenming [1 ]
Xu, Xianghua [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Grid & Serv Comp Lab, Hangzhou 310037, Zhejiang, Peoples R China
来源
PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009) | 2009年
关键词
MapReduce; tfidf; K-Means clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on a cluster of commodity machines. The design and implementation of tfidf and K-Means algorithm on MapReduce is presented. More importantly, we improved the efficiency and effectiveness of the algorithm. Finally, we give the results and some related discussion.
引用
收藏
页码:278 / 280
页数:3
相关论文
共 50 条
  • [41] VDoc+: a virtual document based approach for matching large ontologies using MapReduce
    Hang Zhang
    Wei Hu
    Yu-zhong Qu
    Journal of Zhejiang University SCIENCE C, 2012, 13 : 257 - 267
  • [42] MapReduce based Method for Big Data Semantic Clustering
    Yang, Jie
    Li, Xiaoping
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2814 - 2819
  • [43] Design of Distributed Parallel Computing Using by MapReduce/MPI Technology
    Akhmed-Zaki, Darkhan
    Danaev, Nargozy
    Matkerim, Bazargul
    Bektemessov, Amanzhol
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2013), 2013, 7979 : 139 - 148
  • [44] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [45] Clustering on Big Data Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Khan, Shahbaz
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 789 - 795
  • [46] Software Design and Implementation for MapReduce across Distributed Data Centers
    Wang, Lizhe
    Tao, Jie
    Ma, Yan
    Khan, Samee U.
    Kolodziej, Joanna
    Chen, Dan
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 : 85 - 90
  • [47] A New Distributed Name Disambiguation System Based on MapReduce
    Liu Pengfei
    Ge Sheng
    PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, 2012, : 550 - 554
  • [48] Brief Announcement: Fast and Better Distributed MapReduce Algorithms for k-Center Clustering
    Im, Sungjin
    Moseley, Benjamin
    SPAA'15: PROCEEDINGS OF THE 27TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2015, : 65 - 67
  • [49] Distributed Extreme Learning Machine with kernels based on MapReduce
    Bi, Xin
    Zhao, Xiangguo
    Wang, Guoren
    Zhang, Pan
    Wang, Chao
    NEUROCOMPUTING, 2015, 149 : 456 - 463
  • [50] A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL
    Liu, Yang
    Li, Maozhen
    Khan, Mukhtaj
    Qi, Man
    COMPUTING AND INFORMATICS, 2014, 33 (02) : 259 - 280