Efficient Recommendation of De-Identification Policies Using MapReduce

被引：4

作者：

Ding, Xiaofeng ^{[1
]}

Wang, Li ^{[1
]}

Shao, Zhiyuan ^{[1
]}

Jin, Hai ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Serv Comp Technol & Syst Lab, Wuhan 430074, Hubei, Peoples R China

来源：

IEEE TRANSACTIONS ON BIG DATA | 2019年 / 5卷 / 03期

关键词：

De-identification policy; anonymization; skyline computation; data privacy; SKYLINE; MICRODATA; QUERIES;

D O I：

10.1109/TBDATA.2017.2690660

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many data owners are required to release the data in a variety of real world application, since it is of vital importance to discovery valuable information stay behind the data. However, existing re-identification attacks on the AOL and ADULTS datasets have shown that publish such data directly may cause tremendous threads to the individual privacy. Thus, it is urgent to resolve all kinds of re-identification risks by recommending effective de-identification policies to guarantee both privacy and utility of the data. De-identification policies is one of the models that can be used to achieve such requirements, however, the number of de-identification policies is exponentially large due to the broad domain of quasi-identifier attributes. To better control the trade off between data utility and data privacy, skyline computation can be used to select such policies, but it is yet challenging for efficient skyline processing over large number of policies. In this paper, we propose one parallel algorithm called SKY-FILTER-MR, which is based on MapReduce to overcome this challenge by computing skylines over large scale de-identification policies that is represented by bit-strings. To further improve the performance, a novel approximate skyline computation scheme was proposed to prune unqualified policies using the approximately domination relationship. With approximate skyline, the power of filtering in the policy space generation stage was greatly strengthened to effectively decrease the cost of skyline computation over alternative policies. Extensive experiments over both real life and synthetic datasets demonstrate that our proposed SKY-FILTER-MR algorithm substantially outperforms the baseline approach by up to four times faster in the optimal case, which indicates good scalability over large policy sets.

引用

页码：343 / 354

页数：12

共 41 条

[1]

[Anonymous], P 2006 ACM SIGMOD IN, DOI DOI 10.1145/1142473.1142500

[2]

[Anonymous], 2010, INTEGRATED PUBLIC US

[3]

[Anonymous], 2013, SIGMOD

[4] Efficient Sort-Based Skyline Evaluation [J].

Bartolini, Ilaria ;

Ciaccia, Paolo ;

Patella, Marco .

ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (04)

[5]

Bayardo RJ, 2005, PROC INT CONF DATA, P217

[6]

Benitez Kathleen, 2010, IHI, V2010, P163

[7] AVERAGE NUMBER OF MAXIMA IN A SET OF VECTORS AND APPLICATIONS [J].

BENTLEY, JL ;

KUNG, HT ;

SCHKOLNICK, M ;

THOMPSON, CD .

JOURNAL OF THE ACM, 1978, 25 (04) :536-543

[8] The Skyline operator [J].

Börzsönyi, S ;

Kossmann, D ;

Stocker, K .

17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, :421-430

[9]

Brickell J, 2008, P 14 ACM SIGKDD INT, P70, DOI DOI 10.1145/1401890.1401904

[10] Publishing Microdata with a Robust Privacy Guarantee [J].

Cao, Jianneng ;

Karras, Panagiotis .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11) :1388-1399

← 1 2 3 4 5 →