Utility-Embraced Microaggregation for Machine Learning Applications

Soobin Lee, Won Yong Shin

Research output: Contribution to journalArticlepeer-review


With access to vast amounts of data, privacy protection is more important than ever. Among various de-identification (anonymization) techniques, k -anonymous microaggregation has been widely studied since it enables us to balance between confidentiality and data utility. Despite plenty of microaggregation methods in the sense of reducing the information loss and/or computational complexity, machine learning (ML) models using the resulting aggregated data face the problem that they are not as effective as expected. Motivated by the fact that ML models can be heavily influenced by distorted training data (albeit slightly), we deliberate on the performance of microaggregation in terms of not only data privacy but also data utility. In this paper, we propose Util-MA, a new utility-embraced microaggregation framework for effective ML applications. Specifically, unlike prior studies that apply microaggregation techniques directly to raw data, we design a unified framework that can potentially enhance the data utility while preserving the k -anonymity through preprocessing steps including dimensionality reduction and clustering. By using real-world datasets, we empirically demonstrate the superiority of Util-MA over benchmark microaggregation methods in terms of classification accuracy. Moreover, we investigate the importance of preprocessing by measuring key performance indicators (KPIs) of clustering; the clustering stage of Util-MA leads to high performance on the classification when the clustering results substantially coincide with the ground truth labels. We also establish a close relationship between the KPIs of clustering and the classification accuracies, which tends to be revealed when there is a gain of Util-MA over the benchmark method is observed. Our framework is microaggregation-model-agnostic; thus, underlying microaggregation models can be appropriately chosen according to one's needs and ML tasks.

Original languageEnglish
Pages (from-to)64535-64546
Number of pages12
JournalIEEE Access
Publication statusPublished - 2022

Bibliographical note

Funding Information:
This work was supported in part by the National Research Foundation of Korea (NRF) Grant by the Korean Government through MSIT under Grant 2021R1A2C3004345, in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP).

Publisher Copyright:
© 2013 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)
  • Electrical and Electronic Engineering


Dive into the research topics of 'Utility-Embraced Microaggregation for Machine Learning Applications'. Together they form a unique fingerprint.

Cite this