An Information-Theoretic Approach to Individual Sequential Data Sanitization

被引:9
作者
Bonomi, Luca [1 ]
Fan, Liyue [2 ]
Jin, Hongxia [3 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Univ Southern Calif, Los Angeles, CA USA
[3] Samsung Res Amer, San Jose, CA USA
来源
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16) | 2016年
关键词
Data Sanitization; Sequential Patterns; Mutual Information;
D O I
10.1145/2835776.2835828
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained, personal data has been largely, continuously generated nowadays, such as location check-ins, web histories, physical activities, etc. Those data sequences are typically shared with untrusted parties for data analysis and promotional services. However, the individually-generated sequential data contains behavior patterns and may disclose sensitive information if not properly sanitized. Furthermore, the utility of the released sequence can be adversely affected by sanitization techniques. In this paper, we study the problem of individual sequence data sanitization with minimum utility loss, given user-specified sensitive patterns. We propose a privacy notion based on information theory and sanitize sequence data via generalization. We show the optimization problem is hard and develop two efficient heuristic solutions. Extensive experimental evaluations are conducted on real-world datasets and the results demonstrate the efficiency and effectiveness of our solutions.
引用
收藏
页码:337 / 346
页数:10
相关论文
共 30 条
  • [11] A Practical Framework for Privacy-Preserving Data Analytics
    Fan, Liyue
    Jin, Hongxia
    [J]. PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, : 311 - 321
  • [12] Gkoulalas-Divanis A., 2006, P 15 ACM INT C INFOR, P748, DOI [DOI 10.1145/1183614.1183721, 10.1145/1183614.1183721]
  • [13] Gkoulalas-Divanis Aris, 2011, P 17 ACM SIGKDD INT, P1316, DOI DOI 10.1145/2020408.2020605
  • [14] Permutation-based Sequential Pattern Hiding
    Gwadera, Robert
    Gkoulalas-Divanis, Aris
    Loukides, Grigorios
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 241 - 250
  • [15] Generalization-based privacy preservation and discrimination prevention in data publishing and mining
    Hajian, Sara
    Domingo-Ferrer, Josep
    Farras, Oriol
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (5-6) : 1158 - 1188
  • [16] He Y., 2009, Proceedings of the VLDB Endowment, V2, P934, DOI [DOI 10.14778/1687627.1687733, 10.14778/1687627.1687733]
  • [17] He Y., 2011, Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P165
  • [18] Iyengar V.S., 2002, P 8 ACM SIGKDD INT C, P279, DOI [DOI 10.1145/775047.775089, 10.1145/775047.775089]
  • [19] A Survey on Human Activity Recognition using Wearable Sensors
    Lara, Oscar D.
    Labrador, Miguel A.
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2013, 15 (03): : 1192 - 1209
  • [20] Malin B., 2004, Protecting dna sequence anonymity with generalization lattices