Comparison of Data Processing Tools in Hadoop

被引:0
|
作者
Sachdeva, Karan [1 ]
Lamba, Japtej Singh [1 ]
Sinha, Vishal [1 ]
Singh, Neetu [1 ]
机构
[1] Guru Tegh Bahadur Inst Technol, Comp Sci & Engn, New Delhi, India
来源
2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT) | 2016年
关键词
Big Data; Hadoop; MapReduce; Pig; Hive; Hadoop Distributed File System(HDFS);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since the evolution of computer started, the data that has been generated has also subsequently increased. Eventually, the storage costs increased, but due to advancements in technology and in the field of science these costs have drastically been reduced. Owing to this fact, the data generation has increased exponentially. In 2012, 90% of the entire data which existed in the past had been generated by the end of 2014. These massive amounts of data are usually stored in the form of data sets which are called Big Data. The concern is not the storage of this data, but the processing of the data that is being continuously generated. The data we know has been divided into three categories - Structured Data, Semi-Structured Data and Unstructured Data. 80% of the data generated around the world from various sources such as social media is in unstructured format, primarily text form, while structured data is information present in the form of titled columns and rows which can be easily processed and ordered by data mining tools. We also have semi-structured forms of data which do not conform to any of the previous models mentioned or generic relational database models but contain tags or other markers which are used to separate semantic elements and enforce hierarchies of records and fields within the data. To process this kind of gargantuan amounts data we have different kind of tools in Hadoop, namely - MapReduce, Pig and Hive.
引用
收藏
页码:238 / 242
页数:5
相关论文
共 50 条
  • [31] Exploratory Research on Developing Hadoop-based Data Analytics Tools
    Palit, Henry Novianus
    Dewi, Lily Puspa
    Handojo, Andreas
    Basuki, Kenny
    Mirabel, Mikiavonty Endrawati
    2017 INTERNATIONAL CONFERENCE ON SOFT COMPUTING, INTELLIGENT SYSTEM AND INFORMATION TECHNOLOGY (ICSIIT), 2017, : 160 - 166
  • [32] Present and importance of the implementation of Big Data using the Hadoop and Spark tools
    Montoya Suarez, Lina
    Gil Restrepo, Gustavo Andres
    REVISTA DIGITAL LAMPSAKOS, 2018, (19): : 67 - 72
  • [33] Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform
    Belov, Vladimir
    Nikulchev, Evgeny
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 551 - 557
  • [34] New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools
    Mutasher, Watheq Ghanim
    Aljuboori, Abbas Fadhil
    BAGHDAD SCIENCE JOURNAL, 2022, 19 (04) : 887 - 898
  • [35] Effective processing and integration of large data sets in the Hadoop environment
    Drzymala, Pawel
    Welfle, Henryk
    Drzymala, Agnieszka
    PRZEGLAD ELEKTROTECHNICZNY, 2019, 95 (01): : 29 - 32
  • [36] Case Study of Scientific Data Processing on a Cloud Using Hadoop
    Zhang, Chen
    De Sterck, Hans
    Aboulnaga, Ashraf
    Djambazian, Haig
    Sladek, Rob
    HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS, 2010, 5976 : 400 - +
  • [37] A Data Locality Optimization Algorithm for Large-scale Data Processing in Hadoop
    Zhao, Yanrong
    Wang, Weiping
    Meng, Dan
    Yang, Xiufeng
    Zhang, Shubin
    Li, Jun
    Guan, Gang
    2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 655 - 661
  • [38] Processing Technology of Massive Human Health Data Based on Hadoop
    Liu, Miao
    Yu, Junsheng
    Chen, Zhijiao
    Guo, Jinglin
    Zhao, Jun
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 1389 - 1394
  • [39] Processing of Big Educational Data in the Cloud Using Apache Hadoop
    Machova, Renata
    Komarkova, Jitka
    Lnenicka, Martin
    INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
  • [40] An overview and an Approach for Graph Data Processing using Hadoop MapReduce
    Talan, Pooja P.
    Sharma, Kartik U.
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 59 - 63