Abstract
The recent advent of cross-lingual embeddings, such as multilingual BERT (mBERT), provides a strong baseline for zero-shot cross-lingual transfer. There also exists increasing research attention to reduce the alignment discrepancy of cross-lingual embeddings between source and target languages, via generating code-switched sentences by substituting randomly selected words in the source languages with their counterparts of the target languages. Although these approaches improve the performance, naively code-switched sentences can have inherent limitations. In this paper, we propose SCOPA, a novel technique to improve the performance of zero-shot cross-lingual transfer. Instead of using the embeddings of code-switched sentences directly, SCOPA mixes them softly with the embeddings of original sentences. In addition, SCOPA utilizes an additional pairwise alignment objective, which aligns the vector differences of word pairs instead of word-level embeddings, in order to transfer contextualized information between different languages while preserving language-specific information. Experiments on the PAWS-X and MLDoc dataset show the effectiveness of SCOPA.
Original language | English |
---|---|
Title of host publication | CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management |
Publisher | Association for Computing Machinery |
Pages | 3176-3180 |
Number of pages | 5 |
ISBN (Electronic) | 9781450384469 |
DOIs | |
Publication status | Published - 2021 Oct 26 |
Event | 30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia Duration: 2021 Nov 1 → 2021 Nov 5 |
Publication series
Name | International Conference on Information and Knowledge Management, Proceedings |
---|
Conference
Conference | 30th ACM International Conference on Information and Knowledge Management, CIKM 2021 |
---|---|
Country/Territory | Australia |
City | Virtual, Online |
Period | 21/11/1 → 21/11/5 |
Bibliographical note
Funding Information:This work is supported by FriendliAI and IITP grants (ITRC, IITP-2021-2020-0-01789 and SNU AI Graduate School Program 2021-0-01343)
Publisher Copyright:
© 2021 ACM.
All Science Journal Classification (ASJC) codes
- Business, Management and Accounting(all)
- Decision Sciences(all)