De-Identification of Unstructured Textual Data using Artificial Immune System for Privacy Preserving

被引:1
作者
Rahmani, Amine [1 ]
Amine, Abdelmalek [1 ]
Hamou, Reda Mohamed [1 ]
Boudia, Mohamed Amine [1 ]
Bouarara, Hadj Ahmed [2 ]
机构
[1] Dr Tahar Moulay Univ Saida, Dept Comp Sci, Saida, Algeria
[2] Dr Tahar Moulay Univ Saida, Dept Comp Sci, GeCoDe Lab, Saida, Algeria
关键词
Big Data; CLONALG; Data Perturbation; De-Identification; Immune Systems; Privacy Preserving;
D O I
10.4018/IJDSST.2016100103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of new technologies has led the world into a tipping point. One of these technologies is the big data which made the revolution of computer sciences. Big data has come with new challenges. These challenges can be resumed in the aim of creating scalable and efficient services that can treat huge amounts of heterogeneous data in small scale of time while preserving users' privacy. Textual data occupy a wide space in internet. These data could contain information that can lead to identify users. For that, the development of such approaches that can detect and remove any identifiable information has become a critical research area known as de-identification. This paper tackle the problem of privacy in textual data. The authors' proposed approach consists of using artificial immune systems and MapReduce to detect and hide identifiable words with no matter on their variants using the personnel information of the user from his profile. After many experiments, the system shows a high efficiency in term of number of detected words, the way they are hided with, and time of execution.
引用
收藏
页码:34 / 49
页数:16
相关论文
共 22 条
[1]  
Andrea C. F., 2013, BMC MED INFORM DECIS
[2]  
Andrew J. M., 2013, BMC MED INFORM DECIS
[3]  
[Anonymous], HDB RES INNOVATIONS
[4]  
Bhagwan Varun, 2012, 2012 IEEE Eighth World Congress on Services, P155, DOI 10.1109/SERVICES.2012.57
[5]  
Boukorca Ahcene, 2013, Database and Expert Systems Applications. 24th International Conference, DEXA 2013. Proceedings: LNCS 8055, P278, DOI 10.1007/978-3-642-40285-2_24
[6]  
Daniel A., 2010, ACM INT C WEB INT IN, P297
[7]   Preservative License Plate De-identification for Privacy Protection [J].
Du, Liang ;
Ling, Haibin .
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :468-472
[8]   A Globally Optimal k-Anonymity Method for the De-Identification of Health Data [J].
El Emam, Khaled ;
Dankar, Fida Kamal ;
Issa, Romeo ;
Jonker, Elizabeth ;
Amyot, Daniel ;
Cogo, Elise ;
Corriveau, Jean-Pierre ;
Walker, Mark ;
Chowdhury, Sadrul ;
Vaillancourt, Regis ;
Roffey, Tyson ;
Bottomley, Jim .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (05) :670-682
[9]   Privacy-Preserving Data Publishing: A Survey of Recent Developments [J].
Fung, Benjamin C. M. ;
Wang, Ke ;
Chen, Rui ;
Yu, Philip S. .
ACM COMPUTING SURVEYS, 2010, 42 (04)
[10]  
Gross R., 2009, PROTECTING PRIVACY V, DOI 10.1007/978-1-84882- 301-3_8