SDPA: An Optimizer for Program Analysis of Data-Parallel Applications

被引:4
作者
Wang, Fei [1 ]
Shi, Xuanhua [1 ]
Yu, Dongxiao [1 ]
Ke, Zhixiang [1 ]
Jin, Hai [1 ]
Wu, Song [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Big Data Technol & Syst Lab, Wuhan 430074, Hubei, Peoples R China
来源
IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年
基金
国家重点研发计划;
关键词
data-parallel applications; user defined function; phase; program analysis; EFFICIENT;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-parallel applications have become prevalent due to the fast development of big data technologies. The performances of these applications are obviously one of the most crucial indexes cared about, while program analysis is a commonly used approach for program optimization. But because a great amount of complex operations, such as data partitioning, distribution, and parallelization, are included in the big data programming model, each data-parallel application is related to a great deal of complex framework code, even if the application is simple. If we analyze the application directly, it is very possible that effective results cannot be got even if consuming a large amount of time. Hence, we present an optimizer, called SDPA, to accelerate and simplify the program analysis of data-parallel applications. SDPA makes use of an important feature in data-parallel applications that the running processes have obvious stages. We implement SDPA to accelerate analysis specified in Spark. Extensive experiments are conducted to evaluate the performance and coverage of SDPA on some classical benchmark applications and real-world applications selected from MLlib. The evaluation results show that SDPA can 1) decrease the preprocessing time by 96.4% to 98.8%, 2) reduce the analysis time by approximately 99.8% compared with the approach of analyzing the entire data-parallel applications directly, and 3) cover majority of real-world applications.
引用
收藏
页码:14 / 21
页数:8
相关论文
共 23 条
[1]  
Ali Karim., 2013, ECOOP
[2]  
Allen R. D., 2001, COMPUTER, V35, P89
[3]  
[Anonymous], 2005, Scientific Programming
[4]  
[Anonymous], 2016, SOOT FRAMEWORK
[5]  
[Anonymous], 2012, P 9 USENIX C NET WOR
[6]  
[Anonymous], 2010, PLDI
[7]  
[Anonymous], 1994, Program analysis and specialization for the C programming language
[8]  
[Anonymous], 2004, OSDI
[9]  
Carbone P., 2015, Bull. IEEE Comput. Soc. Tech. Committee Data Eng., V36, P28, DOI DOI 10.1109/IC2EW.2016.56
[10]   SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [J].
Chaiken, Ronnie ;
Jenkins, Bob ;
Larson, Per-Ake ;
Ramsey, Bill ;
Shakib, Darren ;
Weaver, Simon ;
Zhou, Jingren .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02) :1265-1276