Comparison of Data Processing Tools in Hadoop

被引：0

作者：

Sachdeva, Karan ^{[1
]}

Lamba, Japtej Singh ^{[1
]}

Sinha, Vishal ^{[1
]}

Singh, Neetu ^{[1
]}

机构：

[1] Guru Tegh Bahadur Inst Technol, Comp Sci & Engn, New Delhi, India

来源：

2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT) | 2016年

关键词：

Big Data; Hadoop; MapReduce; Pig; Hive; Hadoop Distributed File System(HDFS);

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Since the evolution of computer started, the data that has been generated has also subsequently increased. Eventually, the storage costs increased, but due to advancements in technology and in the field of science these costs have drastically been reduced. Owing to this fact, the data generation has increased exponentially. In 2012, 90% of the entire data which existed in the past had been generated by the end of 2014. These massive amounts of data are usually stored in the form of data sets which are called Big Data. The concern is not the storage of this data, but the processing of the data that is being continuously generated. The data we know has been divided into three categories - Structured Data, Semi-Structured Data and Unstructured Data. 80% of the data generated around the world from various sources such as social media is in unstructured format, primarily text form, while structured data is information present in the form of titled columns and rows which can be easily processed and ordered by data mining tools. We also have semi-structured forms of data which do not conform to any of the previous models mentioned or generic relational database models but contain tags or other markers which are used to separate semantic elements and enforce hierarchies of records and fields within the data. To process this kind of gargantuan amounts data we have different kind of tools in Hadoop, namely - MapReduce, Pig and Hive.

引用

页码：238 / 242

页数：5

共 50 条

[31] Exploratory Research on Developing Hadoop-based Data Analytics Tools
Palit, Henry Novianus
Dewi, Lily Puspa
Handojo, Andreas
Basuki, Kenny
Mirabel, Mikiavonty Endrawati
2017 INTERNATIONAL CONFERENCE ON SOFT COMPUTING, INTELLIGENT SYSTEM AND INFORMATION TECHNOLOGY (ICSIIT), 2017, : 160 - 166
[32] Present and importance of the implementation of Big Data using the Hadoop and Spark tools
Montoya Suarez, Lina
Gil Restrepo, Gustavo Andres
REVISTA DIGITAL LAMPSAKOS, 2018, (19): : 67 - 72
[33] Analysis of Big Data Storage Tools for Data Lakes based on Apache Hadoop Platform
Belov, Vladimir
Nikulchev, Evgeny
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 551 - 557
[34] New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools
Mutasher, Watheq Ghanim
Aljuboori, Abbas Fadhil
BAGHDAD SCIENCE JOURNAL, 2022, 19 (04) : 887 - 898
[35] Effective processing and integration of large data sets in the Hadoop environment
Drzymala, Pawel
Welfle, Henryk
Drzymala, Agnieszka
PRZEGLAD ELEKTROTECHNICZNY, 2019, 95 (01): : 29 - 32
[36] Case Study of Scientific Data Processing on a Cloud Using Hadoop
Zhang, Chen
De Sterck, Hans
Aboulnaga, Ashraf
Djambazian, Haig
Sladek, Rob
HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS, 2010, 5976 : 400 - +
[37] A Data Locality Optimization Algorithm for Large-scale Data Processing in Hadoop
Zhao, Yanrong
Wang, Weiping
Meng, Dan
Yang, Xiufeng
Zhang, Shubin
Li, Jun
Guan, Gang
2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 655 - 661
[38] Processing Technology of Massive Human Health Data Based on Hadoop
Liu, Miao
Yu, Junsheng
Chen, Zhijiao
Guo, Jinglin
Zhao, Jun
PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 1389 - 1394
[39] Processing of Big Educational Data in the Cloud Using Apache Hadoop
Machova, Renata
Komarkova, Jitka
Lnenicka, Martin
INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
[40] An overview and an Approach for Graph Data Processing using Hadoop MapReduce
Talan, Pooja P.
Sharma, Kartik U.
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 59 - 63

← 1 2 3 4 5 →