A NoSQL Data Model For Scalable Big Data Workflow Execution

被引:9
作者
Mohan, Aravind [1 ]
Ebrahimi, Mahdi [1 ]
Lu, Shiyong [1 ]
Kotov, Alexander [1 ]
机构
[1] Wayne State Univ, Detroit, MI 48202 USA
来源
2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016 | 2016年
关键词
Big Data Workflows; NoSQL; Clouds;
D O I
10.1109/BigDataCongress.2016.15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure; 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets; 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.
引用
收藏
页码:52 / 59
页数:8
相关论文
共 26 条
[1]   Survey of graph database models [J].
Angles, Renzo ;
Gutierrez, Claudio .
ACM COMPUTING SURVEYS, 2008, 40 (01)
[2]   Beyond the Data Deluge [J].
Bell, Gordon ;
Hey, Tony ;
Szalay, Alex .
SCIENCE, 2009, 323 (5919) :1297-1298
[3]  
Callahan Steven P, 2006, P 2006 ACM SIGMOD IN, P745
[4]   Bigtable: A distributed storage system for structured data [J].
Chang, Fay ;
Dean, Jeffrey ;
Ghemawat, Sanjay ;
Hsieh, Wilson C. ;
Wallach, Deborah A. ;
Burrows, Mike ;
Chandra, Tushar ;
Fikes, Andrew ;
Gruber, Robert E. .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2008, 26 (02)
[5]   A Big Data Modeling Methodology for Apache Cassandra [J].
Chebotko, Artem ;
Kashlev, Andrey ;
Lu, Shiyong .
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, :238-245
[6]  
Chen HC, 2012, MIS QUART, V36, P1165
[7]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[8]  
Dong Ruan, 2012, 2012 IEEE International Conference on Services Computing (SCC), P274, DOI 10.1109/SCC.2012.71
[9]  
Ebrahimi M, 2015, PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, P523, DOI 10.1109/BigData.2015.7363795
[10]   BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows [J].
Ebrahimi, Mahdi ;
Mohan, Aravind ;
Kashlev, Andrey ;
Lu, Shiyong .
2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, :105-114