Towards efficient and secure analysis of large datasets

被引:0
|
作者
Cimato, Stelvio [1 ]
Nicolo, Stefano [1 ]
机构
[1] Univ Milan, Dipartimento Informat, Milan, Italy
来源
2020 IEEE 44TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2020) | 2020年
关键词
machine learning; privacy preserving techniques; secure multi-party computation;
D O I
10.1109/COMPSAC48688.2020.00-68
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
One of the promises of the "big data" revolution is that trough the analysis of large datasets people will benefit from the solution to many different problems obtained by the deployment of advanced machine learning models. One of the challenges of this standard approach, is that information needs to be centralized on the data center or the machine where the training phase is performed, posing many concerns about privacy. In this paper we take a step towards secure and efficient processing of distributed large datasets, where original data reside at different locations and are processed in a privacy preserving way. In particular we rely on the available technologies to achieve the secure design of a machine learning model by performing the training phase on encrypted data. The case study we examine is focused on the forecasting of energy production by wind farms situated in different locations. We show in detail how the machine learning model is created on the basis of the available datasets, we compare the results with the ones produced by the previous models, and discuss also their performances.
引用
收藏
页码:1351 / 1356
页数:6
相关论文
共 50 条
  • [1] Towards Secure and Efficient Outsourcing of Machine Learning Classification
    Zheng, Yifeng
    Duan, Huayi
    Wang, Cong
    COMPUTER SECURITY - ESORICS 2019, PT I, 2019, 11735 : 22 - 40
  • [2] Subsampling the Concurrent AdaBoost Algorithm: An Efficient Approach for Large Datasets
    Allende-Cid, Hector
    Acuna, Diego
    Allende, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 318 - 325
  • [3] Efficient supervised optimum-path forest classification for large datasets
    Papa, Joao P.
    Falcao, Alexandre X.
    de Albuquerque, Victor Hugo C.
    Tavares, Joao Manuel R. S.
    PATTERN RECOGNITION, 2012, 45 (01) : 512 - 520
  • [4] Secure fuzzy retrieval protocol for multiple datasets
    Zhou, Jie
    Deng, Jiao
    Zeng, Shengke
    He, Mingxing
    Liu, Xingwei
    COMPUTER NETWORKS, 2024, 255
  • [5] Towards a Methodology for Addressing Missingness in Datasets, with an Application to Demographic Health Datasets
    Khangamwa, Gift
    van Zyl, Terence
    van Alten, Clint J.
    ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2022, 2022, 1734 : 169 - 186
  • [6] Towards a Secure Peer-to-Peer Federated Learning Framework
    Piotrowski, Tim
    Nochta, Zoltan
    ADVANCES IN SERVICE-ORIENTED AND CLOUD COMPUTING, ESOCC 2022, 2022, 1617 : 19 - 31
  • [7] Efficient disjointness tests for private datasets
    Ye, Qingsong
    Wang, Huaxiong
    Pieprzyk, Josef
    Zhang, Xian-Mo
    INFORMATION SECURITY AND PRIVACY, 2008, 5107 : 155 - 169
  • [8] PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
    Griffith, Daniel
    Holehouse, Alex S.
    ELIFE, 2021, 10
  • [9] Impact of imbalanced features on large datasets
    Albattah, Waleed
    Khan, Rehan Ullah
    FRONTIERS IN BIG DATA, 2025, 8
  • [10] Towards Secure Big Data Analysis via Fully Homomorphic Encryption Algorithms
    Hamza, Rafik
    Hassan, Alzubair
    Ali, Awad
    Bashir, Mohammed Bakri
    Alqhtani, Samar M.
    Tawfeeg, Tawfeeg Mohmmed
    Yousif, Adil
    ENTROPY, 2022, 24 (04)