Extreme multi-label classification (XMC) aims at finding multiple relevant labels for a given sample from a huge label set at the industrial scale. The XMC problem inherently poses two challenges: scalability and label sparsity - the number of labels is too large, and labels follow the long-tail distribution. To resolve these problems, we propose a novel Mixup-based augmentation method for long-tail labels, called TailMix. Building upon the partition-based model, TailMix utilizes the context vectors generated from the label attention layer. It first selectively chooses two context vectors using the inverse propensity score of labels and the label proximity graph representing the co-occurrence of labels. Using two context vectors, it augments new samples with the long-tail label to improve the accuracy of long-tail labels. Despite its simplicity, experimental results show that TailMix consistently outperforms other augmentation methods on three benchmark datasets, especially for long-tail labels in terms of two metrics, PSP@k and PSN@k.
|Title of host publication||CIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management|
|Publisher||Association for Computing Machinery|
|Number of pages||5|
|Publication status||Published - 2022 Oct 17|
|Event||31st ACM International Conference on Information and Knowledge Management, CIKM 2022 - Atlanta, United States|
Duration: 2022 Oct 17 → 2022 Oct 21
|Name||International Conference on Information and Knowledge Management, Proceedings|
|Conference||31st ACM International Conference on Information and Knowledge Management, CIKM 2022|
|Period||22/10/17 → 22/10/21|
Bibliographical noteFunding Information:
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00421 AI Graduate School Support Program (SKKU) and IITP-2022-2020-0-01821 ICT Creative Consilience program, and No.2022-0-00680.)
© 2022 ACM.
All Science Journal Classification (ASJC) codes
- Business, Management and Accounting(all)
- Decision Sciences(all)