Goal-based composition of scalable hybrid analytics for heterogeneous architectures

被引:2
作者
Coetzee, P. [1 ]
Jarvis, S. A. [1 ,2 ]
机构
[1] Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England
[2] British Lib, Alan Turing Inst, London, England
基金
英国工程与自然科学研究理事会;
关键词
Data science; Hybrid analytics; Analytic planning; Streaming analysis; Hadoop; Data intensive computing; Heterogeneous compute;
D O I
10.1016/j.jpdc.2016.11.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Crafting scalable analytics in order to extract actionable business intelligence is a challenging endeavour, requiring multiple layers of expertise and experience. Often, this expertise is irreconcilably split between an organisation's engineers and subject matter domain experts. Previous approaches to this problem have relied on technically adept users with tool-specific training. Such an approach has a number of challenges: Expertise - There are few data-analytic subject domain experts with in-depth technical knowledge of compute architectures; Performance - Analysts do not generally make full use of the performance and scalability capabilities of the underlying architectures; Heterogeneity - calculating the most performant and scalable mix of real-time (on-line) and batch (off-line) analytics in a problem domain is difficult; Tools - Supporting frameworks will often direct several tasks, including, composition, planning, code generation, validation, performance tuning and analysis, but do not typically provide end-to-end solutions embedding all of these activities. In this paper, we present a novel semi-automated approach to the composition, planning, code generation and performance tuning of scalable hybrid analytics, using a semantically rich type system which requires little programming expertise from the user. This approach is the first of its kind to permit domain experts with little or no technical expertise to assemble complex and scalable analytics, for hybrid on- and off-line analytic environments, with no additional requirement for low-level engineering support. This paper describes (i) an abstract model of analytic assembly and execution, (ii) goal-based planning and (iii) code generation for hybrid on- and off-line analytics. An implementation, through a system which we call MENDELEEV, is used to (iv) demonstrate the applicability of this technique through a series of case studies, where a single interface is used to create analytics that can be run simultaneously over on and off-line environments. Finally, we (v) analyse the performance of the planner, and (vi) show that the performance of MENDELEEV'S generated code is comparable with that of hand-written analytics. (C) 2016 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:59 / 73
页数:15
相关论文
共 39 条
[1]  
Altinel M., 2007, En Proceedings of the 33rd international conference on Very large data bases, P1370
[2]  
[Anonymous], AP STORM
[3]  
Bergmann R, 2011, LECT NOTES ARTIF INT, V6880, P17, DOI 10.1007/978-3-642-23291-6_4
[4]  
Birbeck M., CURIE SYNTAX 1 0 SYN
[5]  
Bracha Gilad., GENERICS JAVA PROGRA
[6]   Towards unified secure on- and off-line analytics at scale [J].
Coetzee, P. ;
Leeke, M. ;
Jarvis, S. .
PARALLEL COMPUTING, 2014, 40 (10) :738-753
[7]   Goal-Based Analytic Composition for On- and Off-line Execution at Scale [J].
Coetzee, Peter ;
Jarvis, Stephen .
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, :56-65
[8]   Large scale, type-compatible service composition [J].
Constantinescu, I ;
Faltings, B ;
Binder, W .
IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, :506-513
[9]  
DANIEL F, 2012, INT C COMP WORLD WID, P493
[10]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137