A Study on Garbage Collection Algorithms for Big Data Environments

被引:19
作者
Bruno, Rodrigo [1 ]
Ferreira, Paulo [1 ]
机构
[1] Univ Lisbon, Inst Super Tecn, INESC ID, Lisbon, Portugal
关键词
Garbage collection; Big Data; processing platforms; storage platform; !text type='Java']Java[!/text; memory managed runtime; scalability; Big Data environment; REAL-TIME; PERFORMANCE; MAPREDUCE; EFFICIENT; !text type='JAVA']JAVA[!/text;
D O I
10.1145/3156818
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The need to process and store massive amounts of data-Big Data-is a reality. In areas such as scientific experiments, social networks management, credit card fraud detection, targeted advertisement, and financial analysis, massive amounts of information are generated and processed daily to extract valuable, summarized information. Due to its fast development cycle (i.e., less expensive to develop), mainly because of automatic memory management, and rich community resources, managed object-oriented programming languages (e.g., Java) are the first choice to develop Big Data platforms (e.g., Cassandra, Spark) on which such Big Data applications are executed. However, automatic memory management comes at a cost. This cost is introduced by the garbage collector, which is responsible for collecting objects that are no longer being used. Although current (classic) garbage collection algorithms may be applicable to small-scale applications, these algorithms are not appropriate for large-scale Big Data environments, as they do not scale in terms of throughput and pause times. In this work, current Big Data platforms and their memory profiles are studied to understand why classic algorithms (which are still the most commonly used) are not appropriate, and also to analyze recently proposed and relevant memory management algorithms, targeted to Big Data environments. The scalability of recent memory management algorithms is characterized in terms of throughput (improves the throughput of the application) and pause time (reduces the latency of the application) when compared to classic algorithms. The study is concluded by presenting a taxonomy of the described works and some open problems, with regard to Big Data memory management, that could be addressed in future works.
引用
收藏
页数:35
相关论文
共 72 条
[1]  
Akerkar R., 2013, Big Data Computing
[2]   MillWheel: Fault-Tolerant Stream Processing at Internet Scale [J].
Akidau, Tyler ;
Balikov, Alex ;
Bekiroglu, Kaya ;
Chernyak, Slava ;
Haberman, Josh ;
Lax, Reuven ;
McVeety, Sam ;
Mills, Daniel ;
Nordstrom, Paul ;
Whittle, Sam .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :1033-1044
[3]   The Jalapeno virtual machine [J].
Alpern, B ;
Attanasio, CR ;
Barton, JJ ;
Burke, MG ;
Cheng, P ;
Choi, JD ;
Cocchi, A ;
Fink, SJ ;
Grove, D ;
Hind, M ;
Hummel, SF ;
Lieber, D ;
Litvinov, V ;
Mergen, MF ;
Ngo, T ;
Russell, JR ;
Sarkar, V ;
Serrano, MJ ;
Shepherd, JC ;
Smith, SE ;
Sreedhar, VC ;
Srinivasan, H ;
Whaley, J .
IBM SYSTEMS JOURNAL, 2000, 39 (01) :211-238
[4]  
[Anonymous], 2015, P 15 WORKSH HOT TOP
[5]  
[Anonymous], 2010, P 2010 ACM SIGMOD IN, DOI [10.1145/1807167.1807184, DOI 10.1145/1807167.1807184]
[6]  
[Anonymous], 2013, MongoDB: The Definitive Guide
[7]  
[Anonymous], 2013, ACM SIGKDD Explorations Newsletter, DOI DOI 10.1145/2481244.2481247
[8]  
[Anonymous], 2015, Graph Databases
[9]  
[Anonymous], 2011, SIGMOD 11 P 2011 INT, DOI [DOI 10.1145/1989323.1989438, 10.1145/1989323.1989438]
[10]  
[Anonymous], 2009, Hadoop: The Definitive Guide