Comparison of Data Processing Tools in Hadoop

被引:0
|
作者
Sachdeva, Karan [1 ]
Lamba, Japtej Singh [1 ]
Sinha, Vishal [1 ]
Singh, Neetu [1 ]
机构
[1] Guru Tegh Bahadur Inst Technol, Comp Sci & Engn, New Delhi, India
来源
2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT) | 2016年
关键词
Big Data; Hadoop; MapReduce; Pig; Hive; Hadoop Distributed File System(HDFS);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since the evolution of computer started, the data that has been generated has also subsequently increased. Eventually, the storage costs increased, but due to advancements in technology and in the field of science these costs have drastically been reduced. Owing to this fact, the data generation has increased exponentially. In 2012, 90% of the entire data which existed in the past had been generated by the end of 2014. These massive amounts of data are usually stored in the form of data sets which are called Big Data. The concern is not the storage of this data, but the processing of the data that is being continuously generated. The data we know has been divided into three categories - Structured Data, Semi-Structured Data and Unstructured Data. 80% of the data generated around the world from various sources such as social media is in unstructured format, primarily text form, while structured data is information present in the form of titled columns and rows which can be easily processed and ordered by data mining tools. We also have semi-structured forms of data which do not conform to any of the previous models mentioned or generic relational database models but contain tags or other markers which are used to separate semantic elements and enforce hierarchies of records and fields within the data. To process this kind of gargantuan amounts data we have different kind of tools in Hadoop, namely - MapReduce, Pig and Hive.
引用
收藏
页码:238 / 242
页数:5
相关论文
共 50 条
  • [21] Hadoop Processing Methods for Large Scale Video Data
    Wu, Guangzhi
    IEEE ACCESS, 2024, 12 : 176247 - 176258
  • [22] The Research of Massive Data Analysis and Processing Based on Hadoop
    Yi, Julan
    PROCEEDINGS OF THE 2015 3RD INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND INFORMATION TECHNOLOGY APPLICATIONS, 2015, 35 : 273 - 277
  • [23] GPU Computations on Hadoop Clusters for Massive Data Processing
    Chen, Wenbo
    Xu, Shungou
    Jiang, Hai
    Weng, Tien-Hsiung
    Marino, Mario Donato
    Chen, Yi-Siang
    Li, Kuan-Ching
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT TECHNOLOGIES AND ENGINEERING SYSTEMS (ICITES2014), 2016, 345 : 515 - 521
  • [24] Storage and Processing System of Meter Data Based on Hadoop
    Liu, Sai
    Guo, Jian
    Feng, Yuan
    2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELING, SIMULATION AND APPLIED MATHEMATICS (CMSAM), 2017, : 453 - 457
  • [25] Big medical data processing system based on hadoop
    Liu, W.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 181 - 181
  • [26] Parallel Processing of Image Segmentation Data Using Hadoop
    Akhtar, M. Nishat
    Saleh, Junita Mohamad
    Grelck, C.
    INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2018, 10 (01): : 74 - 84
  • [27] Regular Grid DEM Data Processing Based on Hadoop
    Liu, Xiaosheng
    Huang, Qiufeng
    Zhong, Liang
    2018 INTERNATIONAL SEMINAR ON COMPUTER SCIENCE AND ENGINEERING TECHNOLOGY (SCSET 2018), 2019, 1176
  • [28] A Time Based Analysis of Data Processing on Hadoop Cluster
    Pal, Amrit
    Agrawal, Sanjay
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 608 - 612
  • [29] Huge Data Analysis and Processing Platform based on Hadoop
    Li, Yuanbin
    Chen, Rong
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 267 - 271
  • [30] Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data
    Hussein, Eslam
    Sadiki, Ronewa
    Jafta, Yahlieel
    Sungay, Muhammad Mujahid
    Ajayi, Olasupo
    Bagula, Antoine
    E-INFRASTRUCTURE AND E-SERVICES FOR DEVELOPING COUNTRIES (AFRICOMM 2019), 2020, 311 : 180 - 185