A Study on Garbage Collection Algorithms for Big Data Environments

被引:19
作者
Bruno, Rodrigo [1 ]
Ferreira, Paulo [1 ]
机构
[1] Univ Lisbon, Inst Super Tecn, INESC ID, Lisbon, Portugal
关键词
Garbage collection; Big Data; processing platforms; storage platform; !text type='Java']Java[!/text; memory managed runtime; scalability; Big Data environment; REAL-TIME; PERFORMANCE; MAPREDUCE; EFFICIENT; !text type='JAVA']JAVA[!/text;
D O I
10.1145/3156818
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The need to process and store massive amounts of data-Big Data-is a reality. In areas such as scientific experiments, social networks management, credit card fraud detection, targeted advertisement, and financial analysis, massive amounts of information are generated and processed daily to extract valuable, summarized information. Due to its fast development cycle (i.e., less expensive to develop), mainly because of automatic memory management, and rich community resources, managed object-oriented programming languages (e.g., Java) are the first choice to develop Big Data platforms (e.g., Cassandra, Spark) on which such Big Data applications are executed. However, automatic memory management comes at a cost. This cost is introduced by the garbage collector, which is responsible for collecting objects that are no longer being used. Although current (classic) garbage collection algorithms may be applicable to small-scale applications, these algorithms are not appropriate for large-scale Big Data environments, as they do not scale in terms of throughput and pause times. In this work, current Big Data platforms and their memory profiles are studied to understand why classic algorithms (which are still the most commonly used) are not appropriate, and also to analyze recently proposed and relevant memory management algorithms, targeted to Big Data environments. The scalability of recent memory management algorithms is characterized in terms of throughput (improves the throughput of the application) and pause time (reduces the latency of the application) when compared to classic algorithms. The study is concluded by presenting a taxonomy of the described works and some open problems, with regard to Big Data memory management, that could be addressed in future works.
引用
收藏
页数:35
相关论文
共 72 条
[51]   Big data: How do your data grow? [J].
Lynch, Clifford .
NATURE, 2008, 455 (7209) :28-29
[52]   Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications [J].
Maas, Martin ;
Asanovic, Krste ;
Harris, Tim ;
Kubiatowicz, John .
ACM SIGPLAN NOTICES, 2016, 51 (04) :457-471
[53]   RECURSIVE FUNCTIONS OF SYMBOLIC EXPRESSIONS AND THEIR COMPUTATION BY MACHINE, .1. [J].
MCCARTHY, J .
COMMUNICATIONS OF THE ACM, 1960, 3 (04) :184-195
[54]  
Moon DavidA., 1984, C RECORD 1984 ACM S, P235, DOI DOI 10.1145/800055.802040
[55]   Naiad: A Timely Dataflow System [J].
Murray, Derek G. ;
McSherry, Frank ;
Isaacs, Rebecca ;
Isard, Michael ;
Barham, Paul ;
Abadi, Martin .
SOSP'13: PROCEEDINGS OF THE TWENTY-FOURTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, 2013, :439-455
[56]  
Nettles Scott, 1992, REPLICATION BASED IN
[57]  
Nguyen K, 2015, ACM SIGPLAN NOTICES, V50, P675, DOI [10.1145/2775054.2694345, 10.1145/2694344.2694345]
[58]  
Olston C., 2008, Proceedings of the 2008 ACM SIGMOD international conference on Management Of Data, SIGMOD '08, P1099
[59]   An efficient on-the-fly cycle collection [J].
Paz, Harel ;
Bacon, David F. ;
Kolodner, Elliot K. ;
Petrank, Erez ;
Rajan, V. T. .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2007, 29 (04)
[60]  
Shvachko K., 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), P1