Abstract
Embeddings, which compress information in raw text into semantics-preserving low-dimensional vectors, have been widely adopted for their efficacy. However, recent research has shown that embeddings can potentially leak private information about sensitive attributes of the text, and in some cases, can be inverted to recover the original input text. To address these growing privacy challenges, we propose a privatization mechanism for embeddings based on homomorphic encryption, to prevent potential leakage of any piece of information in the process of text classification. In particular, our method performs text classification on the encryption of embeddings from state-of-the-art models like BERT, supported by an efficient GPU implementation of CKKS encryption scheme. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.
Original language | English |
---|---|
Title of host publication | NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics |
Subtitle of host publication | Human Language Technologies, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3169-3175 |
Number of pages | 7 |
ISBN (Electronic) | 9781955917711 |
Publication status | Published - 2022 |
Event | 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States Duration: 2022 Jul 10 → 2022 Jul 15 |
Publication series
Name | NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference |
---|
Conference
Conference | 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 22/7/10 → 22/7/15 |
Bibliographical note
Funding Information:Cheon’s team was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [NO.2020-0-00840, Development and Library Implementation of Fully Homomorphic Machine Learning Algorithms supporting Neural Network Learning over Encrypted Data, 50%]. Hwang’s team was supported by Microsoft Research Asia and IITP [(2022-00155958, High Potential Individuals Global Training Program) and (NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University), 50%] .
Publisher Copyright:
© 2022 Association for Computational Linguistics.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Hardware and Architecture
- Information Systems
- Software