DEEP LEARNING BASED SENSITIVE DATA DETECTION

被引:2
作者
Chong, Peng [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610000, Peoples R China
来源
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP) | 2022年
关键词
Sensitive data detection; Data anonymization; Deep Learning; Cyber intelligence; PRIVACY;
D O I
10.1109/ICCWAMTIP56608.2022.10016592
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The growing popularity of edge techniques, such as IoT, 5G, blockchain, make it increasingly challenging to protect sensitive data due to the amount of data increases and the growing volume of regulatory policies. To properly protect sensitive data, it is very important to identify sensitive data and implement data anonymization to ensure the quality and proper use of data anonymization techniques. This work focuses on proactively sensitive data identification, classification and anonymization using machine learning techniques. We first investigated the sensitive data extraction from both structured data and unstructured data, in which Bert models and Regular expressions were used to achieve the identification of sensitive data in real-time. Meanwhile, we propose a comprehensive sensitive detection framework combining the Bert model with regular expressions that can achieve high precision and good generalization capability with not so large corpus. The experimental results demonstrate the effectiveness of proposed solution.
引用
收藏
页数:6
相关论文
共 17 条
[1]  
Alferidah Dhuha Khalid, 2020, 2020 International Conference on Computational Intelligence (ICCI), P103, DOI 10.1109/ICCI51257.2020.9247722
[2]  
[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390177
[3]  
Chalapathy R, 2016, Arxiv, DOI arXiv:1609.07585
[4]   A survey on privacy in decentralized online social networks [J].
De Salve, Andrea ;
Mori, Paolo ;
Ricci, Laura .
COMPUTER SCIENCE REVIEW, 2018, 27 :154-176
[5]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]
[6]   NCBI disease corpus: A resource for disease name recognition and concept normalization [J].
Dogan, Rezarta Islamaj ;
Leaman, Robert ;
Lu, Zhiyong .
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 47 :1-10
[7]  
Elijah AV, 2019, INT J ADV COMPUT SC, V10, P520
[8]  
Khan J., SCI PROGRAMMING-NETH, V2022
[9]  
Li J., IEEE T KNOWL DATA EN, V34, P202
[10]  
Liu Y.-I., 2022, AR4 CLIMATE CHANGE 2, V46