Counterfactual Generative Smoothing for Imbalanced Natural Language Classification

Hojae Han, Seungtaek Choi, Myeongho Jeong, Jin Woo Park, Seung Won Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Classification datasets are often biased in observations, leaving onlya few observations for minority classes. Our key contribution is de-tecting and reducing Under-represented (U-) and Over-represented(O-) artifacts from dataset imbalance, by proposing a Counterfac-tual Generative Smoothing approach on both feature-space anddata-space, namely CGS_f and CGS_d. Our technical contribution issmoothing majority and minority observations, by sampling a ma-jority seed and transferring to minority. Our proposed approachesnot only outperform state-of-the-arts in both synthetic and real-lifedatasets, they effectively reduce both artifact types.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages3058-3062
Number of pages5
ISBN (Electronic)9781450384469
DOIs
Publication statusPublished - 2021 Oct 26
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 2021 Nov 12021 Nov 5

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period21/11/121/11/5

Bibliographical note

Funding Information:
∗corresponding author, supported by Microsoft Research Asia and IITP grants (2021-0-01696, High-Potential Individuals Global Training Program, and 2021-0-01343, SNU AI Graduate School)

Publisher Copyright:
© 2021 ACM.

All Science Journal Classification (ASJC) codes

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint

Dive into the research topics of 'Counterfactual Generative Smoothing for Imbalanced Natural Language Classification'. Together they form a unique fingerprint.

Cite this