Design and Implement of Distributed Document Clustering Based on MapReduce

被引:0
|
作者
Wan, Jian [1 ]
Yu, Wenming [1 ]
Xu, Xianghua [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Grid & Serv Comp Lab, Hangzhou 310037, Zhejiang, Peoples R China
来源
PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009) | 2009年
关键词
MapReduce; tfidf; K-Means clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on a cluster of commodity machines. The design and implementation of tfidf and K-Means algorithm on MapReduce is presented. More importantly, we improved the efficiency and effectiveness of the algorithm. Finally, we give the results and some related discussion.
引用
收藏
页码:278 / 280
页数:3
相关论文
共 50 条
  • [31] Distributed Noun Attribute Based on its First Appearance for Text Document Clustering
    Vijayalakshmi, S.
    Manimegalai, D.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 780 - 784
  • [32] Efficient Pre-Processing for Enhanced Semantics Based Distributed Document Clustering
    Shah, Neepa
    Mahajan, Sunita
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 338 - 343
  • [33] Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce
    Zhang, Yanfeng
    Chen, Shimin
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3218 - 3230
  • [34] QUALITY BASED CLUSTERING USING MAPREDUCE FRAMEWORK
    Gowri, R.
    Rathipriya, R.
    PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,
  • [35] A Big Graph Clustering Algorithm Based on MapReduce
    Leng, Yonglin
    Zhang, Qingchen
    MODERN TECHNOLOGIES IN MATERIALS, MECHANICS AND INTELLIGENT SYSTEMS, 2014, 1049 : 1467 - +
  • [36] Parallel Black Hole Clustering Based on MapReduce
    Tsai, Chun-Wei
    Hsieh, Cheng-Han
    Chiang, Ming-Chao
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 2543 - 2548
  • [37] A Parallel Clustering Method Study Based on MapReduce
    Sun Zhanquan
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 416 - 419
  • [38] A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets
    Ankita Sinha
    Prasanta K. Jana
    The Journal of Supercomputing, 2018, 74 : 1562 - 1579
  • [39] A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets
    Sinha, Ankita
    Jana, Prasanta K.
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (04): : 1562 - 1579
  • [40] Distributed document clustering using word-clusters
    Deb, Debzani
    Angryk, Rafal A.
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 376 - 383