Big data storage technologies: a survey

被引:0
作者
Aisha Siddiqa
Ahmad Karim
Abdullah Gani
机构
[1] University of Malaya,Faculty of Computer Science and Information Technology
[2] Bahauddin Zakariya University,Department of Information Technology
来源
Frontiers of Information Technology & Electronic Engineering | 2017年 / 18卷
关键词
Big data; Big data storage; NoSQL databases; Distributed databases; CAP theorem; Scalability; Consistency-partition resilience; Availability-partition resilience; TP311.13;
D O I
暂无
中图分类号
学科分类号
摘要
There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed ‘big data’. The structural shift of the storage mechanism from traditional data management systems to NoSQL technology is due to the intention of fulfilling big data storage requirements. However, the available big data storage technologies are inefficient to provide consistent, scalable, and available solutions for continuously growing heterogeneous data. Storage is the preliminary process of big data analytics for real-world applications such as scientific experiments, healthcare, social networks, and e-business. So far, Amazon, Google, and Apache are some of the industry standards in providing big data storage solutions, yet the literature does not report an in-depth survey of storage technologies available for big data, investigating the performance and magnitude gains of these technologies. The primary objective of this paper is to conduct a comprehensive investigation of state-of-the-art storage technologies available for big data. A well-defined taxonomy of big data storage technologies is presented to assist data analysts and researchers in understanding and selecting a storage mechanism that better fits their needs. To evaluate the performance of different storage architectures, we compare and analyze the existing approaches using Brewer’s CAP theorem. The significance and applications of storage technologies and support to other categories are discussed. Several future research challenges are highlighted with the intention to expedite the deployment of a reliable and scalable storage system.
引用
收藏
页码:1040 / 1070
页数:30
相关论文
共 110 条
[1]  
Abadi D.J.(2009)Column-oriented database systems Proc. VLDB Endow. 2 1664-1665
[2]  
Boncz P.A.(2009)HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads Proc. VLDB Endow. 2 922-933
[3]  
Harizopoulos S.(2012)Techniques about data replication for mobile ad-hoc network databases Int. J. Multidiscipl. Sci. Eng. 3 53-57
[4]  
Abouzeid A.(2012)Comparative analysis of relational and graph databases Int. J. Soft Comput. Eng. 2 509-512
[5]  
Bajda-Pawlikowski K.(2012)CAP twelve years later: how the “rules” have changed Computer 45 23-29
[6]  
Abadi D.(2014)Storage-optimizing clustering algorithms for high-dimensional tick data Expert Syst. Appl. 41 4148-4157
[7]  
Azeem R.(2010)Scalable SQL and NoSQL data stores SIGMOD Rec. 39 12-27
[8]  
Khan M.I.A.(2008)Bigtable: a distributed storage system for structured data ACM Trans. Comput. Syst. 26 1-26
[9]  
Batra S.(2014)Data-intensive applications, challenges, techniques and technologies: a survey on big data Inform. Sci. 275 314-347
[10]  
Tyagi C.(2014)Big data: a survey Mob. Networks Appl. 19 171-209