TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

被引:0
|
作者
Zhou, Xin [1 ]
Lu, Yi [5 ,6 ]
Ma, Ruotian [1 ]
Gui, Tao [2 ]
Wang, Yuran [4 ]
Ding, Yong [4 ]
Zhang, Yibo [4 ]
Zhang, Qi [1 ]
Huang, Xuanjing [1 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Fudan Univ, Inst Modern Languages & Linguist, Shanghai, Peoples R China
[3] Int Human Phenome Inst, Shanghai, Peoples R China
[4] Honor Device Co Ltd, Shenzhen, Peoples R China
[5] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[6] Fudan NLP Lab, Shanghai, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-world applications, pre-trained language models are typically deployed on the cloud, allowing clients to upload data and perform compute-intensive inference remotely. To avoid sharing sensitive data directly with service providers, clients can upload numerical representations rather than plain text to the cloud. However, recent text reconstruction techniques have demonstrated that it is possible to transform representations into original words, suggesting that privacy risk remains. In this paper, we propose TextObfuscator, a novel framework for preserving inference privacy by applying random perturbations to clustered representations. The random perturbations make each word representation indistinguishable from surrounding functionally similar representations, thus obscuring word information while retaining the original word functionality. To achieve this, we utilize prototypes to learn clustered representations, where words of similar functionality are encouraged to be closer to the same prototype during training. Additionally, we design different methods to find prototypes for token-level and sentence-level tasks, which can improve performance by incorporating semantic and task information. Experimental results on token and sentence classification tasks show that TextObfuscator achieves improvement over compared methods without increasing inference cost.
引用
收藏
页码:5459 / 5473
页数:15
相关论文
共 50 条
  • [1] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [2] Pre-trained Affective Word Representations
    Chawla, Kushal
    Khosla, Sopan
    Chhaya, Niyati
    Jaidka, Kokil
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [3] On the Language Neutrality of Pre-trained Multilingual Representations
    Libovicky, Jindrich
    Rosa, Rudolf
    Fraser, Alexander
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1663 - 1674
  • [4] Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations
    Wu, Chenwei
    Lee, Holden
    Ge, Rong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [6] Classifying Code Comments via Pre-trained Programming Language Model
    Li, Ying
    Wang, Haibo
    Zhang, Huaien
    Tan, Shin Hwei
    2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 24 - 27
  • [7] Aspect Based Sentiment Analysis by Pre-trained Language Representations
    Liang Tianxin
    Yang Xiaoping
    Zhou Xibo
    Wang Bingqian
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
  • [8] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [9] Improved Word Sense Disambiguation Using Pre-Trained ContextualizedWord Representations
    Hadiwinoto, Christian
    Ng, Hwee Tou
    Gan, Wee Chung
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5297 - 5306
  • [10] TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations
    Azzouza, Noureddine
    Akli-Astouati, Karima
    Ibrahim, Roliana
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 428 - 437