UHRP: Uncertainty-Based Pruning Method for Anonymized Data Linear Regression

被引:1
作者
Liu, Kun [1 ]
Liu, Wenyan [1 ]
Cheng, Junhong [1 ]
Lu, Xingjian [2 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Software Engn, Shanghai, Peoples R China
[2] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai, Peoples R China
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS | 2019年 / 11448卷
基金
国家重点研发计划;
关键词
Machine learning; Anonymization; Interval value; PRIVACY; MODEL;
D O I
10.1007/978-3-030-18590-9_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Anonymization method, as a kind of privacy protection technology for data publishing, has been heavily researched during the past twenty years. However, fewer researches have been conducted on making better use of the anonymized data for data mining. In this paper, we focus on training regression model using anonymized data and predicting on original samples using the trained model. Anonymized training instances are generally considered as hyper-rectangles, which is different from most machine learning tasks. We propose several hyper-rectangle vectorization methods that are compatible with both anonymized data and original data for model training. Anonymization brings additional uncertainty. To address this issue, we propose an Uncertainty-based HyperRectangle Pruning method (UHRP) to reduce the disturbance introduced by anonymized data. In this method, we prune hyper-rectangle by its global uncertainty which is calculated from all uncertain attributes. Experiments show that a linear regressor trained on anonymized data could be expected to do as well as the model trained with original data under specific conditions. Experimental results also prove that our pruning method could further improve the model's performance.
引用
收藏
页码:19 / 33
页数:15
相关论文
共 25 条
[1]   Signed-Distance Measures Oriented to Rank Interval-Valued Fuzzy Numbers [J].
Akbari, Mohammad Ghasem ;
Hesamian, Gholamreza .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (06) :3506-3513
[2]   Linear Model With Exact Inputs and Interval-Valued Fuzzy Outputs [J].
Akbari, Mohammad Ghasem ;
Hesamian, Gholamreza .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (02) :518-530
[3]  
Buratovic I., 2012, 2012 35th International Convention on Information and Communication Technology, Electronics and Microelectronics, P1619
[4]  
Dua D., 2017, Uci machine learning repository
[5]   Differential privacy: A survey of results [J].
Dwork, Cynthia .
THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, PROCEEDINGS, 2008, 4978 :1-19
[6]   A Privacy Protection Model for Patient Data with Multiple Sensitive Attributes [J].
Gal, Tamas S. ;
Chen, Zhiyuan ;
Gangopadhyay, Aryya .
INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2008, 2 (03) :28-44
[7]   Dynamic Fusion of Multisource Interval-Valued Data by Fuzzy Granulation [J].
Huang, Yanyong ;
Li, Tianrui ;
Luo, Chuan ;
Fujita, Hamido ;
Horng, Shi-Jinn .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (06) :3403-3417
[8]   Using Anonymized Data for Classification [J].
Inan, Ali ;
Kantarcioglu, Murat ;
Bertino, Elisa .
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, :429-+
[9]   INFORMATION THEORY AND STATISTICAL MECHANICS [J].
JAYNES, ET .
PHYSICAL REVIEW, 1957, 106 (04) :620-630
[10]  
LeFevre K., 2005, SIGMOD C, P49, DOI DOI 10.1145/1066157.1066164