Comparison of Data Processing Tools in Hadoop

被引:0
|
作者
Sachdeva, Karan [1 ]
Lamba, Japtej Singh [1 ]
Sinha, Vishal [1 ]
Singh, Neetu [1 ]
机构
[1] Guru Tegh Bahadur Inst Technol, Comp Sci & Engn, New Delhi, India
来源
2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT) | 2016年
关键词
Big Data; Hadoop; MapReduce; Pig; Hive; Hadoop Distributed File System(HDFS);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since the evolution of computer started, the data that has been generated has also subsequently increased. Eventually, the storage costs increased, but due to advancements in technology and in the field of science these costs have drastically been reduced. Owing to this fact, the data generation has increased exponentially. In 2012, 90% of the entire data which existed in the past had been generated by the end of 2014. These massive amounts of data are usually stored in the form of data sets which are called Big Data. The concern is not the storage of this data, but the processing of the data that is being continuously generated. The data we know has been divided into three categories - Structured Data, Semi-Structured Data and Unstructured Data. 80% of the data generated around the world from various sources such as social media is in unstructured format, primarily text form, while structured data is information present in the form of titled columns and rows which can be easily processed and ordered by data mining tools. We also have semi-structured forms of data which do not conform to any of the previous models mentioned or generic relational database models but contain tags or other markers which are used to separate semantic elements and enforce hierarchies of records and fields within the data. To process this kind of gargantuan amounts data we have different kind of tools in Hadoop, namely - MapReduce, Pig and Hive.
引用
收藏
页码:238 / 242
页数:5
相关论文
共 50 条
  • [41] A Comparative Study of Big Data Processing : Hadoop vs. Spark
    Sharma, Meghna
    Kaur, Jagdeep
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 1073 - 1077
  • [42] A Sensor Data Processing and Access Platform based on Hadoop for Smart Environments
    Lin, Chi-Yi
    Li, Chia-Chen
    Huang, Wei-Hsun
    Liao, Wei-Che
    Chen, Wei-Ming
    2014 17TH INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS (NBIS 2014), 2014, : 453 - 458
  • [43] A Data Processing Framework for Cloud Environment Based on Hadoop and Grid Middleware
    Kim, Hyukho
    Kim, Woongsup
    Lee, Kyoungmook
    Kim, Yangwoo
    GRID AND DISTRIBUTED COMPUTING, 2011, 261 : 515 - +
  • [44] Design and Development of a Medical Big Data Processing System Based on Hadoop
    Qin Yao
    Yu Tian
    Peng-Fei Li
    Li-Li Tian
    Yang-Ming Qian
    Jing-Song Li
    Journal of Medical Systems, 2015, 39
  • [45] High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing using Hadoop
    Sivaraman, E.
    Manickachezian, R.
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 32 - 36
  • [46] Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
    Rathore, M. Mazhar
    Son, Hojae
    Ahmad, Awais
    Paul, Anand
    Jeon, Gwanggil
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (03) : 630 - 646
  • [47] IC-Data: Improving Compressed Data Processing in Hadoop
    Haider, Adnan
    Yang, Xi
    Liu, Ning
    Sun, Xian-He
    He, Shuibing
    2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015, : 356 - 365
  • [48] Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
    M. Mazhar Rathore
    Hojae Son
    Awais Ahmad
    Anand Paul
    Gwanggil Jeon
    International Journal of Parallel Programming, 2018, 46 : 630 - 646
  • [49] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
    Benlachmi, Yassine
    Elyazidi, Abdelaziz
    Hasnaoui, Moulay Lahcen
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 875 - +
  • [50] Performance Evaluation of Hadoop Tools Using Word Count Algorithm
    Benlachmi, Yassine
    Elyazidi, Abdelaziz
    Hasnaoui, Moulay Lahcen
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 875 - 887