SCOPA: Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer

Dohyeon Lee, Jaeseong Lee, Gyewon Lee, Byung Gon Chun, Seung Won Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The recent advent of cross-lingual embeddings, such as multilingual BERT (mBERT), provides a strong baseline for zero-shot cross-lingual transfer. There also exists increasing research attention to reduce the alignment discrepancy of cross-lingual embeddings between source and target languages, via generating code-switched sentences by substituting randomly selected words in the source languages with their counterparts of the target languages. Although these approaches improve the performance, naively code-switched sentences can have inherent limitations. In this paper, we propose SCOPA, a novel technique to improve the performance of zero-shot cross-lingual transfer. Instead of using the embeddings of code-switched sentences directly, SCOPA mixes them softly with the embeddings of original sentences. In addition, SCOPA utilizes an additional pairwise alignment objective, which aligns the vector differences of word pairs instead of word-level embeddings, in order to transfer contextualized information between different languages while preserving language-specific information. Experiments on the PAWS-X and MLDoc dataset show the effectiveness of SCOPA.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages3176-3180
Number of pages5
ISBN (Electronic)9781450384469
DOIs
Publication statusPublished - 2021 Oct 26
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 2021 Nov 12021 Nov 5

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period21/11/121/11/5

Bibliographical note

Funding Information:
This work is supported by FriendliAI and IITP grants (ITRC, IITP-2021-2020-0-01789 and SNU AI Graduate School Program 2021-0-01343)

Publisher Copyright:
© 2021 ACM.

All Science Journal Classification (ASJC) codes

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint

Dive into the research topics of 'SCOPA: Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer'. Together they form a unique fingerprint.

Cite this