Automatic Detection of Sensitive Data Using Transformer-Based Classifiers

被引:2
作者
Petrolini, Michael [1 ]
Cagnoni, Stefano [1 ]
Mordonini, Monica [1 ]
机构
[1] Univ Parma, Dept Engn & Architecture, Parco Area Sci 181a, I-43124 Parma, Italy
关键词
GDPR; sensitive data; personal data; natural language processing; BERT; transformers;
D O I
10.3390/fi14080228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.
引用
收藏
页数:15
相关论文
共 50 条
[21]   Adaptation of Transformer-Based Models for Depression Detection [J].
Adebanji, Olaronke O. ;
Ojo, Olumide E. ;
Calvo, Hiram ;
Gelbukh, Irina ;
Sidorov, Grigori .
COMPUTACION Y SISTEMAS, 2024, 28 (01) :151-165
[22]   A transformer-based approach to irony and sarcasm detection [J].
Potamias, Rolandos Alexandros ;
Siolas, Georgios ;
Stafylopatis, Andreas-Georgios .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23) :17309-17320
[23]   Transformer-based Conformal Predictors for Paraphrase Detection [J].
Giovannotti, Patrizio ;
Gammerman, Alex .
CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 152, 2021, 152 :243-265
[24]   Transformer-based Automatic Music Mood Classification Using Multi-modal Framework [J].
Kumar, Sujeesha Ajithakumari Suresh ;
Rajan, Rajeev .
JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2023, 23 (01) :18-34
[25]   BLOCK-SPARSE ADVERSARIAL ATTACK TO FOOL TRANSFORMER-BASED TEXT CLASSIFIERS [J].
Sadrizadeh, Sahar ;
Dolamic, Ljiljana ;
Frossard, Pascal .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7837-7841
[26]   Transformer-Based Models for the Automatic Indexing of Scientific Documents in French [J].
Angel Gonzalez, Jose ;
Buscaldi, Davide ;
Sanchis, Emilio ;
Hurtado, Lluis-F .
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 :60-72
[27]   Transformer-Based Approach for Automatic Semantic Financial Document Verification [J].
Toprak, Ahmet ;
Turan, Metin .
IEEE ACCESS, 2024, 12 :184327-184349
[28]   A Transformer-Based Approach for Smart Invocation of Automatic Code Completion [J].
de Moor, Aral ;
van Deursen, Arie ;
Izadi, Maliheh .
PROCEEDINGS OF THE 1ST ACM INTERNATIONAL CONFERENCE ON AI-POWERED SOFTWARE, AIWARE 2024, 2024, :28-37
[29]   Transformer-Based Bidirectional Encoder Representations for Emotion Detection from Text [J].
Kumar, Ashok J. ;
Cambria, Erik ;
Trueman, Tina Esther .
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[30]   TMD-BERT: A Transformer-Based Model for Transportation Mode Detection [J].
Drosouli, Ifigenia ;
Voulodimos, Athanasios ;
Mastorocostas, Paris ;
Miaoulis, Georgios ;
Ghazanfarpour, Djamchid .
ELECTRONICS, 2023, 12 (03)