Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

被引:607
作者
Hu, Han [1 ]
Wen, Yonggang [2 ]
Chua, Tat-Seng [1 ]
Li, Xuelong [3 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[3] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Ctr Opt Imagery Anal & Learning, State Key Lab Transient Opt & Photon, Xian 710119, Peoples R China
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Big data analytics; cloud computing; data acquisition; data storage; data analytics; Hadoop; OF-THE-ART; FEATURE GENERATION; ENERGY-EFFICIENT; MAP-REDUCE; PERFORMANCE; MAPREDUCE; CHALLENGES; FRAMEWORK; NETWORKS; INDUSTRY;
D O I
10.1109/ACCESS.2014.2332453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the definition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems.
引用
收藏
页码:652 / 687
页数:36
相关论文
共 286 条
[1]  
Abouzied A., 2010, SIGMOD C, P1111
[2]   Symbiotic Routing in Future Data Centers [J].
Abu-Libdeh, Hussam ;
Costa, Paolo ;
Rowstron, Antony ;
O'Shea, Greg ;
Donnelly, Austin .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2010, 40 (04) :51-62
[3]   Data Center TCP (DCTCP) [J].
Alizadeh, Mohammad ;
Greenberg, Albert ;
Maltz, David A. ;
Padhye, Jitendra ;
Patel, Parveen ;
Prabhakar, Balaji ;
Sengupta, Sudipta ;
Sridharan, Murari .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2010, 40 (04) :63-74
[4]  
[Anonymous], P NAT I STAND TECHN
[5]  
[Anonymous], 2010, SYNTHESIS LECT HUMAN, DOI DOI 10.2200/S00274ED1V01Y201006HLT007
[6]  
[Anonymous], 2005, Scientific Programming
[7]  
[Anonymous], DATA VISUALIZAITON S
[8]  
[Anonymous], FACT SHEET BIG DATA
[9]  
[Anonymous], AST DAT
[10]  
[Anonymous], TOKYO CANBINET