Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption

Garam Lee, Minsoo Kim, Jai Hyun Park, Seung Won Hwang, Jung Hee Cheon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Embeddings, which compress information in raw text into semantics-preserving low-dimensional vectors, have been widely adopted for their efficacy. However, recent research has shown that embeddings can potentially leak private information about sensitive attributes of the text, and in some cases, can be inverted to recover the original input text. To address these growing privacy challenges, we propose a privatization mechanism for embeddings based on homomorphic encryption, to prevent potential leakage of any piece of information in the process of text classification. In particular, our method performs text classification on the encryption of embeddings from state-of-the-art models like BERT, supported by an efficient GPU implementation of CKKS encryption scheme. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.

Original languageEnglish
Title of host publicationNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages3169-3175
Number of pages7
ISBN (Electronic)9781955917711
Publication statusPublished - 2022
Event2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States
Duration: 2022 Jul 102022 Jul 15

Publication series

NameNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period22/7/1022/7/15

Bibliographical note

Funding Information:
Cheon’s team was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [NO.2020-0-00840, Development and Library Implementation of Fully Homomorphic Machine Learning Algorithms supporting Neural Network Learning over Encrypted Data, 50%]. Hwang’s team was supported by Microsoft Research Asia and IITP [(2022-00155958, High Potential Individuals Global Training Program) and (NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University), 50%] .

Publisher Copyright:
© 2022 Association for Computational Linguistics.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption'. Together they form a unique fingerprint.

Cite this