Comparison of Data Processing Tools in Hadoop

被引:0
|
作者
Sachdeva, Karan [1 ]
Lamba, Japtej Singh [1 ]
Sinha, Vishal [1 ]
Singh, Neetu [1 ]
机构
[1] Guru Tegh Bahadur Inst Technol, Comp Sci & Engn, New Delhi, India
来源
2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT) | 2016年
关键词
Big Data; Hadoop; MapReduce; Pig; Hive; Hadoop Distributed File System(HDFS);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since the evolution of computer started, the data that has been generated has also subsequently increased. Eventually, the storage costs increased, but due to advancements in technology and in the field of science these costs have drastically been reduced. Owing to this fact, the data generation has increased exponentially. In 2012, 90% of the entire data which existed in the past had been generated by the end of 2014. These massive amounts of data are usually stored in the form of data sets which are called Big Data. The concern is not the storage of this data, but the processing of the data that is being continuously generated. The data we know has been divided into three categories - Structured Data, Semi-Structured Data and Unstructured Data. 80% of the data generated around the world from various sources such as social media is in unstructured format, primarily text form, while structured data is information present in the form of titled columns and rows which can be easily processed and ordered by data mining tools. We also have semi-structured forms of data which do not conform to any of the previous models mentioned or generic relational database models but contain tags or other markers which are used to separate semantic elements and enforce hierarchies of records and fields within the data. To process this kind of gargantuan amounts data we have different kind of tools in Hadoop, namely - MapReduce, Pig and Hive.
引用
收藏
页码:238 / 242
页数:5
相关论文
共 50 条
  • [1] A Comparison of Hadoop Tools for Analyzing Tabular Data
    Tomasic, Ivan
    Rashkovska, Aleksandra
    Depolli, Matjaz
    Trobec, Roman
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2013, 37 (02): : 131 - 138
  • [2] Processing Real World Datasets using Big Data Hadoop Tools
    Deshai, N.
    Sekhar, B. V. D. S.
    Reddy, P. V. G. D. Prasad
    Chakravarthy, V. V. S. S. S.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (07): : 631 - 635
  • [3] Floating Car Data Processing Model Based on Hadoop-GIS Tools
    Deng, Zhu
    Bai, Yuqi
    2016 FIFTH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS (AGRO-GEOINFORMATICS), 2016, : 46 - 49
  • [4] A Comparison of Big Remote Sensing Data Processing with Hadoop MapReduce and Spark
    Chebbi, I.
    Boulila, W.
    Mellouli, N.
    Lamolle, M.
    Farah, I. R.
    2018 4TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2018,
  • [5] Big Data Management Processing with Hadoop MapReduce and Spark Technology: A Comparison
    Verma, Ankush
    Mansuri, Ashik Hussain
    Jain, Neelesh
    2016 SYMPOSIUM ON COLOSSAL DATA ANALYSIS AND NETWORKING (CDAN), 2016,
  • [6] Processing LIDAR Data with Apache Hadoop
    Ruzicka, Jan
    Orcik, Lukas
    Ruzickova, Katerina
    Kisztner, Juraj
    RISE OF BIG SPATIAL DATA, 2017, : 351 - 358
  • [7] Big data and Spark: Comparison with Hadoop
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 811 - 817
  • [8] Efficient Big Data Processing in Hadoop MapReduce
    Dittrich, Jens
    Quiane-Ruiz, Jorge-Arnulfo
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2014 - 2015
  • [9] Online Data Processing on Cloud and Hadoop Platform
    Akhtar, Ayesha
    Shakir, Muhammad Sohaib
    2017 FOURTH HCT INFORMATION TECHNOLOGY TRENDS (ITT), 2017, : 25 - 29
  • [10] Processing and Analysis of Seismic data in Hadoop Platform
    Chen, Zhuang
    Zhang, Ti
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2017,