SparkAC: Fine-Grained Access Control in Spark for Secure Data Sharing and Analytics

被引:3
作者
Xue, Tao [1 ,2 ]
Wen, Yu [1 ]
Luo, Bo [3 ]
Li, Gang [4 ]
Li, Yingjiu [5 ]
Zhang, Boyang [1 ]
Zheng, Yang [1 ]
Hu, Yanfei [1 ]
Meng, Dan [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100045, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 101408, Peoples R China
[3] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[4] Deakin Univ, Ctr Cyber Secur Res & Innovat, Geelong, Vic 3217, Australia
[5] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
关键词
Sparks; Access control; Data analysis; Data models; Big Data; Optimization; Hospitals; Spark; big data; access control; data sharing; data protection; purpose; BIG-DATA; FLOW;
D O I
10.1109/TDSC.2022.3149544
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of computing and communication technologies, an extremely large amount of data has been collected, stored, utilized, and shared, while new security and privacy challenges arise. Existing access control mechanisms provided by big data platforms have limitations in granularity and expressiveness. In this article, we present SparkAC, a novel access control mechanism for secure data sharing and analysis in Spark. In particular, we first propose a purpose-aware access control (PAAC) model, which introduces new concepts of data processing purpose and data operation purposeand an automatic purpose analysis algorithm that identifies purposes from data analytics operations and queries. Moreover, we develop a unified access control mechanism that implements PAAC model in two modules. GuardSpark++ supports structured data access control in Spark Catalyst and GuardDAG supports unstructured data access control in Spark core. Finally, we evaluate GuardSpark++ and GuardDAG with multiple data sources, applications, and data analytics engines. Experimental results show that SparkAC provides effective access control functionalities with very small (GuardSpark++) or medium (GuardDAG) performance overhead.
引用
收藏
页码:1104 / 1123
页数:20
相关论文
共 90 条
  • [21] Bertino E., 2018, Studies in Big Data, P425, DOI [10.1007/978-3-319-61893-725, 10.1007/978-3-319, DOI 10.1007/978-3-319]
  • [22] Borthakur D., 2008, HDFS ARCHITECTURE GU, V53, P2
  • [23] Purpose based access control for privacy protection in relational database systems
    Byun, Ji-Won
    Li, Ninghui
    [J]. VLDB JOURNAL, 2008, 17 (04) : 603 - 619
  • [24] Carbone P., 2015, IEEE Data Engineering Bulletin, V38, P28, DOI DOI 10.1109/IC2EW.2016.56
  • [25] Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
    Chen, C. L. Philip
    Zhang, Chun-Yang
    [J]. INFORMATION SCIENCES, 2014, 275 : 314 - 347
  • [26] Chothia Z, 2016, PROC VLDB ENDOW, V9, P1137
  • [27] Access Control in the Era of Big Data: State of the Art and Research Directions
    Colombo, Pietro
    Ferrari, Elena
    [J]. SACMAT'18: PROCEEDINGS OF THE 23RD ACM SYMPOSIUM ON ACCESS CONTROL MODELS & TECHNOLOGIES, 2018, : 185 - 192
  • [28] Towards a unifying Attribute Based Access Control approach for NoSQL datastores
    Colombo, Pietro
    Ferrari, Elena
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 709 - 720
  • [29] Enhancing MongoDB with Purpose-Based Access Control
    Colombo, Pietro
    Ferrari, Elena
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2017, 14 (06) : 591 - 604
  • [30] Colombo P, 2016, PROC INT CONF DATA, P1516, DOI 10.1109/ICDE.2016.7498402