Design and Implement of Distributed Document Clustering Based on MapReduce

被引:0
|
作者
Wan, Jian [1 ]
Yu, Wenming [1 ]
Xu, Xianghua [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Grid & Serv Comp Lab, Hangzhou 310037, Zhejiang, Peoples R China
来源
PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2009) | 2009年
关键词
MapReduce; tfidf; K-Means clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on a cluster of commodity machines. The design and implementation of tfidf and K-Means algorithm on MapReduce is presented. More importantly, we improved the efficiency and effectiveness of the algorithm. Finally, we give the results and some related discussion.
引用
收藏
页码:278 / 280
页数:3
相关论文
共 50 条
  • [21] Design of K-means clustering algorithm in PGAS based Mapreduce framework
    Shomanov, A. S.
    Mansurova, M. E.
    Nugumanova, A. B.
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 158 - 160
  • [22] DESIGN OF A DISTRIBUTED DOCUMENT SYSTEM BASED ON ODA - DODS
    ACEBRON, JJ
    UPPER LAYER PROTOCOLS, ARCHITECTURES AND APPLICATIONS, 1994, 25 : 215 - 225
  • [23] PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce
    Lamari, Yasmine
    Slaoui, Said Chah
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2018, 17 (02)
  • [24] Design and implement of distributed database-based pricing management system
    Liu, Xiaobing
    Li, Xiufei
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6809 - +
  • [25] Design and implement wireless temperature sensor network based on distributed platform
    Chen, Hang
    Jiang, ShuWen
    Chen, Guifen
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 114 - +
  • [26] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [27] Optimized Distributed Text Document Clustering Algorithm
    Judith, J. E.
    Jayakumari, J.
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 2, 2015, 325 : 565 - 574
  • [28] Distributed Video Transcoding Based on MapReduce
    Song, Chenwei
    Shen, Wenfeng
    Sun, Lianqiang
    Lei, Zhou
    Xu, Weimin
    2014 IEEE/ACIS 13TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2014, : 303 - 308
  • [29] MapReduce Design of K-Means Clustering Algorithm
    Anchalia, Prajesh P.
    Koundinya, Anjan K.
    Srinath, N. K.
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
  • [30] PGAS Approach to Implement Mapreduce Framework Based on UPC Language
    Aday, Shomanov
    Darkhan, Akhmed-Zaki
    Madina, Mansurova
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2017), 2017, 10421 : 342 - 350