Extracting-Transforming-Loading Modeling Approach for Big Data Analytics

被引:6
作者
Bala, Mahfoud [1 ]
Boussaid, Omar [2 ]
Alimazighi, Zaia [3 ]
机构
[1] Saad Dahleb Univ, Dept Informat, Blida 1, Algeria
[2] Univ Lyon 1, Lab ERIC, Lyon, France
[3] Univ Sci & Technol Houari Boumediene, Dept Comp Sci, Bab Ezzouar, Algeria
关键词
Big Data; Data Warehousing; Extracting-Transforming-Loading; Hadoop; MapReduce; Parallel and Distributed Processing;
D O I
10.4018/IJDSST.2016100104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to their widespread use, Internet, Web 2.0 and digital sensors create data in non-traditional volumes (at terabytes and petabytes scale). The big data characterized by the four V's has brought with it new challenges given the limited capabilities of traditional computing systems. This paper aims to provide solutions which can cope with very large data in Decision-Support Systems (DSSs). In the data integration phase, specifically, the authors propose a conceptual modeling approach for parallel and distributed Extracting-Transforming-Loading (ETL) processes. Among the complexity dimensions of big data, this study focuses on the volume of data to ensure a good performance for ETL processes. The authors' approach allows anticipating on the parallelization/distribution issues at the early stage of Data Warehouse (DW) projects. They have implemented an ETL platform called Parallel-ETL (P-ETL for short) and conducted some experiments. Their performance analysis reveals that the proposed approach enables to speed up ETL processes by up to 33% with the improvement rate being linear.
引用
收藏
页码:50 / 69
页数:20
相关论文
共 27 条
[1]  
Akkaoui Z., 2009, PROCEEDING ACM 12 IN, P41, DOI [10.1145/1651291.1651299, DOI 10.1145/1651291.1651299]
[2]   MapReduce: A Flexible Data Processing Tool [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2010, 53 (01) :72-77
[3]  
Demarest M., 1997, POLITICS DATA WAREHO
[4]   A visual language-based system for extraction-transformation-loading development [J].
Deufemia, Vincenzo ;
Giordano, Massimiliano ;
Polese, Giuseppe ;
Tortora, Genoveffa .
SOFTWARE-PRACTICE & EXPERIENCE, 2014, 44 (12) :1417-1440
[5]  
El Akkaoui Z., 2011, P ACM
[6]  
Embley DW, 2013, LECT NOTES COMPUT SC, V8217
[7]  
Jing Han, 2011, Proceedings 2011 6th International Conference on Pervasive Computing and Applications (ICPCA 2011), P363, DOI 10.1109/ICPCA.2011.6106531
[8]   CloudETL: Scalable Dimensional ETL for Hive [J].
Liu, Xiufeng ;
Thomsen, Christian ;
Pedersen, Torben Bach .
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, :195-206
[9]   MapReduce-based Dimensional ETL Made Easy [J].
Liu, Xiufeng ;
Thomsen, Christian ;
Pedersen, Torben Bach .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12) :1882-1885
[10]  
Misra Sumit, 2013, Big Data Analytics. Second International Conference, BDA 2013. Proceedings: LNCS 8302, P176, DOI 10.1007/978-3-319-03689-2_12