The big data system, components, tools, and technologies: a survey

被引:0
作者
T. Ramalingeswara Rao
Pabitra Mitra
Ravindara Bhatt
A. Goswami
机构
[1] Indian Institute of Technology Kharagpur,Theoretical Computer Science Group, Department of Mathematics
[2] Indian Institute of Technology Kharagpur,Department of Computer Science and Engineering
[3] Jaypee University of Information Technology,Department of Computer Science and Engineering
来源
Knowledge and Information Systems | 2019年 / 60卷
关键词
Big data; Components of big data system; Distributed file systems; NoSQL databases; Visualization; SQL Query tools; Data analytics;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional databases are not capable of handling unstructured data and high volumes of real-time datasets. Diverse datasets are unstructured lead to big data, and it is laborious to store, manage, process, analyze, visualize, and extract the useful insights from these datasets using traditional database approaches. However, many technical aspects exist in refining large heterogeneous datasets in the trend of big data. This paper aims to present a generalized view of complete big data system which includes several stages and key components of each stage in processing the big data. In particular, we compare and contrast various distributed file systems and MapReduce-supported NoSQL databases concerning certain parameters in data management process. Further, we present distinct distributed/cloud-based machine learning (ML) tools that play a key role to design, develop and deploy data models. The paper investigates case studies on distributed ML tools such as Mahout, Spark MLlib, and FlinkML. Further, we classify analytics based on the type of data, domain, and application. We distinguish various visualization tools pertaining three parameters: functionality, analysis capabilities, and supported development environment. Furthermore, we systematically investigate big data tools and technologies (Hadoop 3.0, Spark 2.3) including distributed/cloud-based stream processing tools in a comparative approach. Moreover, we discuss functionalities of several SQL Query tools on Hadoop based on 10 parameters. Finally, We present some critical points relevant to research directions and opportunities according to the current trend of big data. Investigating infrastructure tools for big data with recent developments provides a better understanding that how different tools and technologies apply to solve real-life applications.
引用
收藏
页码:1165 / 1245
页数:80
相关论文
共 362 条
[1]  
Mattmann CA(2013)Computing: a vision for data science Nature 493 473-475
[2]  
Atzori L(2010)The internet of things: a survey Comput Netw 54 2787-2805
[3]  
Iera A(2017)Networking for big data: a survey IEEE Commun Surv Tutor 19 531-549
[4]  
Morabito G(2018)Multimedia big data analytics: a survey ACM Comput Surv 51 10-28
[5]  
Yu S(2017)Internet of things security: a survey J Netw Comput Appl 88 10-209
[6]  
Liu M(2014)Big data: a survey Mob Netw Appl 19 171-115
[7]  
Dou W(2015)The rise of big data on cloud computing: review and open research issues Inf Syst 47 98-808
[8]  
Liu X(2014)Machine learning for big data analytics in plants Trends Plant Sci 19 798-5
[9]  
Zhou S(2013)Mining big data: current status, and forecast to the future ACM sIGKDD Explor Newsl 14 1-409
[10]  
Pouyanfar S(2014)Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks Wiley Interdiscip Rev: Data Min Knowl Discov 4 380-15