NewTL: Engineering an Extract, Transform, Load (ETL) Software System for Business on a Very Large Scale

被引:2
作者
Debroy, Vidroha [1 ]
Brimble, Lance [1 ]
Yost, Matt [1 ]
机构
[1] Varidesk Inc, Coppell, TX 75019 USA
来源
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2018年
关键词
Extract; Transform; Load; ETL; Scalability; Reliability; Enterprise Resource Planning; Maintenance;
D O I
10.1145/3167132.3167300
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large-scale ETL (Extract, Transform, Load) Software Systems are very complex to build, and even harder to maintain, due to the many moving pieces, and varying ownership and responsibilities, across multiple parties. While some research has been done in the area of ETL, it primarily focuses on techniques for optimization of expensive operations, and identifying common challenges; but sheds little light on how to build a reliable and robust system in practice. In this paper, we discuss NewTL, an ETL software system that was built completely in-house at Varidesk Inc., that allows for ETL on a very large scale. and has facilitated seamless integration between our 3rd Party Logistics Providers (3PLs) and our ERP (Enterprise Resource Planning) system. In doing so, we provide-technical insights on real-world concerns such as speed and scalability; transparency on our decision to build versus buy; and touch upon other aspects that would be relevant to any ETL system in practice. We also provide metrics/real data (collected both directly by us, as well as through telemetry from external monitoring), that demonstrate the quality of NewTL. We find this to be the first study of its kind on such a scale, that serves to help other practitioners, as well as stimulate further research into this highly important field.
引用
收藏
页码:1568 / 1575
页数:8
相关论文
共 17 条
[1]  
[Anonymous], CISC VIS NETW IND GL
[2]   Basic concepts and taxonomy of dependable and secure computing [J].
Avizienis, A ;
Laprie, JC ;
Randell, B ;
Landwehr, C .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2004, 1 (01) :11-33
[3]  
Bala M, 2014, I C COMP SYST APPLIC, P42, DOI 10.1109/AICCSA.2014.7073177
[4]  
Dammak S., 2015, P INT C COMP INF SCI, P13
[5]   A proposed model for data warehouse ETL processes [J].
El-Sappagh, Shaker H. Ali ;
Hendawi, Abdeltawab M. Ahmed ;
El Bastawissy, Ali Hamed .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2011, 23 (02) :91-104
[6]  
Hutchison G., 1987, IEEE Network, V1, P16
[7]   Electronic data interchange and small organizations: Adoption and impact of technology [J].
Iacovou, CL ;
Benbasat, I ;
Dexter, AS .
MIS QUARTERLY, 1995, 19 (04) :465-485
[8]  
Jorg T., 2008, Proceedings of the 2008 international symposium on Database engineering applications - IDEAS'08, p, P101, DOI DOI 10.1145/1451940.1451956
[9]  
Li X., 2009, P INT C INF SCI ENG
[10]   MapReduce-based Dimensional ETL Made Easy [J].
Liu, Xiufeng ;
Thomsen, Christian ;
Pedersen, Torben Bach .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12) :1882-1885