Design and Implement of Distributed Document Clustering Based on MapReduce

被引:0
|
作者
Wan, Jian [1 ]
Yu, Wenming [1 ]
Xu, Xianghua [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Grid & Serv Comp Lab, Hangzhou 310037, Zhejiang, Peoples R China
来源
PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009) | 2009年
关键词
MapReduce; tfidf; K-Means clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on a cluster of commodity machines. The design and implementation of tfidf and K-Means algorithm on MapReduce is presented. More importantly, we improved the efficiency and effectiveness of the algorithm. Finally, we give the results and some related discussion.
引用
收藏
页码:278 / 280
页数:3
相关论文
共 50 条
  • [1] MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 131 - 142
  • [2] An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm
    Sardar T.H.
    Ansari Z.
    Ansari, Zahid (zahid_cs@pace.edu.in), 1600, Springer (101): : 641 - 650
  • [3] Design and Implement a MapReduce Framework for Executing Standalone Software Packages in Hadoop-based Distributed Environmentsn
    Chen, Chao-Chun
    Hung, Min-Hsiung
    Giang, Nguyen Huu Tinh
    Lin, Hsuan-Chun
    Lin, Tzu-Chao
    SMART SCIENCE, 2013, 1 (02) : 99 - 107
  • [4] MapReduce-based distributed tensor clustering algorithm
    Zhang, Hongjun
    Li, Peng
    Meng, Fanshuo
    Fan, Weibei
    Xue, Zhuangzhuang
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (35): : 24633 - 24649
  • [5] MapReduce-based distributed tensor clustering algorithm
    Hongjun Zhang
    Peng Li
    Fanshuo Meng
    Weibei Fan
    Zhuangzhuang Xue
    Neural Computing and Applications, 2023, 35 : 24633 - 24649
  • [6] Distributed Document Clustering Analysis Based on a Hybrid Method
    Judith, J. E.
    Jayakumari, J.
    CHINA COMMUNICATIONS, 2017, 14 (02) : 131 - 142
  • [7] The BigKClustering Approach for Document Clustering using Hadoop MapReduce
    Megarchioti, Sofia
    Mamalis, Basilis
    22ND PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2018), 2018, : 261 - 266
  • [8] Distributed Document Clustering Analysis Based on a Hybrid Method
    J.E.Judith
    J.Jayakumari
    中国通信, 2017, 14 (02) : 131 - 142
  • [9] Distributed hierarchical document clustering
    Deb, Debzani
    Fuad, M. Muztaba
    Angryk, Rafal A.
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER SCIENCE AND TECHNOLOGY, 2006, : 328 - +
  • [10] A NEW APPROACH FOR DOCUMENT CLUSTERING USING MAPREDUCE (VAR-SECTING CLUSTERING)
    Elsayed, Abdelrahman
    Ismail, Osama
    Mokhtar, Hoda M. O.
    PROCEEDINGS OF THE EUROPEAN CONFERENCE ON DATA MINING 2015 AND INTERNATIONAL CONFERENCES ON INTELLIGENT SYSTEMS AND AGENTS 2015 AND THEORY AND PRACTICE IN MODERN COMPUTING 2015, 2015, : 57 - 64