Online Scheduling and Interference Alleviation for Low-Latency, High-Throughput Processing of Data Streams

被引:24
作者
Buddhika, Thilina [1 ]
Stern, Ryan [1 ]
Lindburg, Kira [1 ]
Ericson, Kathleen [1 ]
Pallickara, Shrideep [1 ]
机构
[1] Colorado State Univ, Deptartment Comp Sci, Ft Collins, CO 80523 USA
基金
美国国家科学基金会;
关键词
Low-latency stream processing; online scheduling; data intensive computing; RESOURCE; INTERNET;
D O I
10.1109/TPDS.2017.2723403
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data Streams occur naturally in several observational settings and often need to be processed with a low latency. Streams pose unique challenges: they have no preset lifetimes, the traffic on these streams may be bursty, and data arrival rates on these streams can be quite high. Furthermore, stream processing computations are generally stateful where the outcome of processing a data stream packet depends on the state that builds up within the computation over multiple, successive rounds of execution. As the number of streams increases, stream processing computations need to be orchestrated over a collection of machines. Achieving timeliness and high throughput in such settings is a challenge. Optimal scheduling of stream processing computations is an instance of the resource constrained scheduling problem, and depending on the precise formulation of the problem can be characterized as either NP-Complete or NP-Hard. We have designed an algorithm for online scheduling of stream processing computations. Our algorithm focuses on reducing interference that adversely impacts performance of stream processing computations. Our measure of interference is based on stream packet arrivals at a particular machine, the accompanying resource utilization encompassing CPU, memory and network utilization, and the resource utilization at machines comprising the cluster. Our algorithm performs continuous, incremental detection of interference experienced by computations and performing migrations to alleviate them.
引用
收藏
页码:3553 / 3569
页数:17
相关论文
共 41 条
[1]  
Agha G. A., 1985, AITR844 DTIC MIT
[2]   Impact of CPU Utilization Thresholds and Scaling Size on Autoscaling Cloud Resources [J].
Al-Haidari, F. ;
Sqalli, M. ;
Salah, K. .
2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 2, 2013, :256-261
[3]  
Aniello L., 2013, P 7 ACM INT C DISTR, P207
[4]  
[Anonymous], 2003, Proceedings of the 29th international conference on Very large data bases
[5]  
[Anonymous], 2013, BD3@ VLDB
[6]   Scheduling multithreaded computations by work stealing [J].
Blumofe, RD ;
Leiserson, CE .
JOURNAL OF THE ACM, 1999, 46 (05) :720-748
[7]   NEPTUNE: Real Time Stream Processing for Internet of Things and Sensing Environments [J].
Buddhika, Thilina ;
Pallickara, Shrideep .
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :1143-1152
[8]  
Chatzistergiou A., 2014, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, P1579
[9]   Fog Computing: Helping the Internet of Things Realize Its Potential [J].
Dastjerdi, Amir Vahid ;
Buyya, Rajkumar .
COMPUTER, 2016, 49 (08) :112-116
[10]   The Tail at Scale [J].
Dean, Jeffrey ;
Barroso, Luiz Andre .
COMMUNICATIONS OF THE ACM, 2013, 56 (02) :74-80