An improved content splitting and merging algorithm for Hadoop clusters using component analysis and hamming distance

被引:0
|
作者
Singh B. [1 ]
Verma H.K. [1 ]
Kumar G. [2 ]
Kim H.-J. [3 ]
机构
[1] Department of Computer Science and Engineering, Dr. B.R. Ambedkar National Institute of Technology, Jalandhar
[2] Department of Computer Science and Engineering, Lovely Professional University, Jalandhar
[3] Business Administration Research Institute, Sungshin W. University, 2 Bomun-ro 34da gil, Seongbuk-gu, Seoul
关键词
Big data; Cluster; Hadoop; Merge; Split;
D O I
10.1504/ijtpm.2019.10025765
中图分类号
学科分类号
摘要
Distributed storage and processing of dataset of big data have become an integrated component of data science. With the technology progress towards the Internet of Things (IoTs), big data becomes more important. Therefore, processing of such data needs utmost concern for the ease of availability and accuracy. Various research has been executed till date for the efficient use of splitting and merging of content in the processing of data. But, somehow they lack in the generation of proper clusters in Hadoop. In this paper, we have shown an efficient approach of using splitting and merging process of data processing. We have used component analysis and hamming distance to generate thee clusters depending on the split values which is novel in this domain of work. The experimented results of our proposed approach provide better efficiency in term of discrete clusters and time consumption. © 2019 Inderscience Enterprises Ltd.
引用
收藏
页码:392 / 404
页数:12
相关论文
共 13 条
  • [1] Improved Apriori Algorithm Using Power Set on Hadoop
    Imran, Abdullah
    Ranjan, Prabhat
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS, ICCII 2016, 2017, 507 : 245 - 254
  • [2] An Efficient Improved Join Algorithm Using Map Reduce in Hadoop
    Patel, Warish D.
    Vaghela, Dineshkumar B.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROPAGATION AND COMPUTER TECHNOLOGY (ICSPCT 2014), 2014, : 263 - 272
  • [3] An Improved K-means Algorithm Using Modified Cosine Distance Measure for Document Clustering Using Mahout with Hadoop
    Sahu, Lokesh
    Mohan, Biju R.
    2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 1048 - 1052
  • [4] Implementation of an Improved Algorithm for Frequent Itemset Mining using Hadoop
    Agarwal, Ruchi
    Singh, Sunny
    Vats, Satvik
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 13 - 18
  • [5] Doopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininet
    Qiao, Yuansong
    Wang, Xueyuan
    Fang, Guiming
    Lee, Brian
    2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 784 - 790
  • [6] A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm
    Benlaehmi, Yassine
    El Yazidi, Abdelaziz
    Hasnaoui, Moulay Lahcen
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (04) : 778 - 788
  • [7] Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop
    Komal Dhingra
    Sumit Kr Yadav
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 2143 - 2162
  • [8] Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop
    Dhingra, Komal
    Yadav, Sumit Kr
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (08) : 2143 - 2162
  • [9] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [10] An improved chaotic image encryption algorithm using Hadoop-based MapReduce framework for massive remote sensed images in parallel IoT applications
    Mahmoud Ahmad Al-Khasawneh
    Irfan Uddin
    Syed Atif Ali Shah
    Ahmad M. Khasawneh
    Laith Abualigah
    Marwan Mahmoud
    Cluster Computing, 2022, 25 : 999 - 1013