TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

被引：0

作者：

Zhou, Xin ^{[1
]}

Lu, Yi ^{[5
,6
]}

Ma, Ruotian ^{[1
]}

Gui, Tao ^{[2
]}

Wang, Yuran ^{[4
]}

Ding, Yong ^{[4
]}

Zhang, Yibo ^{[4
]}

Zhang, Qi ^{[1
]}

Huang, Xuanjing ^{[1
,3
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[2] Fudan Univ, Inst Modern Languages & Linguist, Shanghai, Peoples R China

[3] Int Human Phenome Inst, Shanghai, Peoples R China

[4] Honor Device Co Ltd, Shenzhen, Peoples R China

[5] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[6] Fudan NLP Lab, Shanghai, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

基金：

上海市自然科学基金; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In real-world applications, pre-trained language models are typically deployed on the cloud, allowing clients to upload data and perform compute-intensive inference remotely. To avoid sharing sensitive data directly with service providers, clients can upload numerical representations rather than plain text to the cloud. However, recent text reconstruction techniques have demonstrated that it is possible to transform representations into original words, suggesting that privacy risk remains. In this paper, we propose TextObfuscator, a novel framework for preserving inference privacy by applying random perturbations to clustered representations. The random perturbations make each word representation indistinguishable from surrounding functionally similar representations, thus obscuring word information while retaining the original word functionality. To achieve this, we utilize prototypes to learn clustered representations, where words of similar functionality are encouraged to be closer to the same prototype during training. Additionally, we design different methods to find prototypes for token-level and sentence-level tasks, which can improve performance by incorporating semantic and task information. Experimental results on token and sentence classification tasks show that TextObfuscator achieves improvement over compared methods without increasing inference cost.

引用

页码：5459 / 5473

页数：15

共 50 条

[1] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[2] Pre-trained Affective Word Representations
Chawla, Kushal
Khosla, Sopan
Chhaya, Niyati
Jaidka, Kokil
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
[3] On the Language Neutrality of Pre-trained Multilingual Representations
Libovicky, Jindrich
Rosa, Rudolf
Fraser, Alexander
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1663 - 1674
[4] Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations
Wu, Chenwei
Lee, Holden
Ge, Rong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[6] Classifying Code Comments via Pre-trained Programming Language Model
Li, Ying
Wang, Haibo
Zhang, Huaien
Tan, Shin Hwei
2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 24 - 27
[7] Aspect Based Sentiment Analysis by Pre-trained Language Representations
Liang Tianxin
Yang Xiaoping
Zhou Xibo
Wang Bingqian
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
[8] Adder Encoder for Pre-trained Language Model
Ding, Jianbang
Zhang, Suiyun
Li, Linlin
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
[9] Improved Word Sense Disambiguation Using Pre-Trained ContextualizedWord Representations
Hadiwinoto, Christian
Ng, Hwee Tou
Gan, Wee Chung
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5297 - 5306
[10] TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations
Azzouza, Noureddine
Akli-Astouati, Karima
Ibrahim, Roliana
EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 428 - 437

← 1 2 3 4 5 →