JRBridge: A Framework of Large-Scale Statistical Computing for R

被引:3
作者
Xie, Xia [1 ]
Cao, Jie [1 ]
Jin, Hai [1 ]
Ke, Xijiang [1 ]
Cao, Wenzhi [1 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
来源
2012 IEEE ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC) | 2012年
关键词
R Language; JVM; Hadoop; MapReduce; Statistical Computing Method;
D O I
10.1109/APSCC.2012.74
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Demands for highly scalable parallel data processing platforms is raising due to an explosion in the number of massive-scale data intensive applications both in industry and in sciences. Performing statistical computing over huge data repositories poses a significant challenge to existing statistical software and computational infrastructure. After analyzing various open source computational infrastructures and their programming paradigm APIs, the results have shown that most of them are JVM based, and their APIs are given as Java interfaces or abstract classes. This paper proposes a generic framework JRBridge, which can integrate R and JVM-based computational infrastructures by generating Java APIs code wrapper around the native R code automatically and handling type conversion. Using this framework, we build a distributed statistical computing environment by integrating R with Hadoop. With the Hadoop Distributed File System plugin, it brings a way to store and access datasets with millions of objects. With MapReduce plugin, it brings a natural environment to code MapReduce algorithms in R. The experiment result shows JRBridge scales linearly with the size of the datasets and thus provides a scalable solution for large-scale statistical computing in R.
引用
收藏
页码:27 / 34
页数:8
相关论文
共 23 条
[1]  
Amini Lisa., 2006, Proceedings of the 4th inter- national workshop on Data mining standards, services and platforms, DMSSP '06, P27
[2]  
[Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
[3]  
[Anonymous], P 2010 ACM SIGMOD IN, DOI [DOI 10.1145/1807167.1807184, 10.1145/1807167.1807184]
[4]  
[Anonymous], ADV NEURAL INFORM PR
[5]  
Beyer K. S., 2011, PROCEEDINGS OF THE 3, P1272
[6]  
Bu Y., P VLDB END 2010, V3, P285
[7]  
DAS S., 2010, ACM SIGMOD INT C MAN, P987, DOI DOI 10.1145/1807167.1807275
[8]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[9]  
Demaille A., 2011, BISON GNU PARSER GEN
[10]  
Guha S., 2011, RHIPE R HADOOP INTEG