Classification datasets are often biased in observations, leaving onlya few observations for minority classes. Our key contribution is de-tecting and reducing Under-represented (U-) and Over-represented(O-) artifacts from dataset imbalance, by proposing a Counterfac-tual Generative Smoothing approach on both feature-space anddata-space, namely CGS_f and CGS_d. Our technical contribution issmoothing majority and minority observations, by sampling a ma-jority seed and transferring to minority. Our proposed approachesnot only outperform state-of-the-arts in both synthetic and real-lifedatasets, they effectively reduce both artifact types.
|Title of host publication||CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management|
|Publisher||Association for Computing Machinery|
|Number of pages||5|
|Publication status||Published - 2021 Oct 26|
|Event||30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia|
Duration: 2021 Nov 1 → 2021 Nov 5
|Name||International Conference on Information and Knowledge Management, Proceedings|
|Conference||30th ACM International Conference on Information and Knowledge Management, CIKM 2021|
|Period||21/11/1 → 21/11/5|
Bibliographical noteFunding Information:
∗corresponding author, supported by Microsoft Research Asia and IITP grants (2021-0-01696, High-Potential Individuals Global Training Program, and 2021-0-01343, SNU AI Graduate School)
© 2021 ACM.
All Science Journal Classification (ASJC) codes
- Business, Management and Accounting(all)
- Decision Sciences(all)