SparkAC: Fine-Grained Access Control in Spark for Secure Data Sharing and Analytics

被引:3
作者
Xue, Tao [1 ,2 ]
Wen, Yu [1 ]
Luo, Bo [3 ]
Li, Gang [4 ]
Li, Yingjiu [5 ]
Zhang, Boyang [1 ]
Zheng, Yang [1 ]
Hu, Yanfei [1 ]
Meng, Dan [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100045, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 101408, Peoples R China
[3] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[4] Deakin Univ, Ctr Cyber Secur Res & Innovat, Geelong, Vic 3217, Australia
[5] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
关键词
Sparks; Access control; Data analysis; Data models; Big Data; Optimization; Hospitals; Spark; big data; access control; data sharing; data protection; purpose; BIG-DATA; FLOW;
D O I
10.1109/TDSC.2022.3149544
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of computing and communication technologies, an extremely large amount of data has been collected, stored, utilized, and shared, while new security and privacy challenges arise. Existing access control mechanisms provided by big data platforms have limitations in granularity and expressiveness. In this article, we present SparkAC, a novel access control mechanism for secure data sharing and analysis in Spark. In particular, we first propose a purpose-aware access control (PAAC) model, which introduces new concepts of data processing purpose and data operation purposeand an automatic purpose analysis algorithm that identifies purposes from data analytics operations and queries. Moreover, we develop a unified access control mechanism that implements PAAC model in two modules. GuardSpark++ supports structured data access control in Spark Catalyst and GuardDAG supports unstructured data access control in Spark core. Finally, we evaluate GuardSpark++ and GuardDAG with multiple data sources, applications, and data analytics engines. Experimental results show that SparkAC provides effective access control functionalities with very small (GuardSpark++) or medium (GuardDAG) performance overhead.
引用
收藏
页码:1104 / 1123
页数:20
相关论文
共 90 条
  • [1] Akoush S., 2014, PROC 6 USENIX WORKSH
  • [2] [Anonymous], 2019, GITBOOK
  • [3] [Anonymous], 2021, STRUCTURED STREAMING
  • [4] [Anonymous], 2017, INTRO ROW COLUMN LEV
  • [5] [Anonymous], 2011, Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. SIGMOD'11, DOI DOI 10.1145/1989323.1989346
  • [6] [Anonymous], 2013, PROC HASP ISCA
  • [7] [Anonymous], 2021, GRAPHFRAMES USER GUI
  • [8] [Anonymous], 2022, Apache Spark
  • [9] [Anonymous], 2021, Hofstede InsightsJune 22
  • [10] [Anonymous], 2016, TPC DS STANDARD SPEC