Soft representation learning for sparse transfer

Haeju Park, Jinyoung Yeo, Gengyu Wang, Seung Won Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Transfer learning is effective for improving the performance of tasks that are related, and Multi-task learning (MTL) and Cross-lingual learning (CLL) are important instances. This paper argues that hard-parameter sharing, of hard-coding layers shared across different tasks or languages, cannot generalize well, when sharing with a loosely related task. Such case, which we call sparse transfer, might actually hurt performance, a phenomenon known as negative transfer. Our contribution is using adversarial training across tasks, to “soft-code” shared and private spaces, to avoid the shared space gets too sparse. In CLL, our proposed architecture considers another challenge of dealing with low-quality input.

Original languageEnglish
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1560-1568
Number of pages9
ISBN (Electronic)9781950737482
Publication statusPublished - 2020
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy
Duration: 2019 Jul 282019 Aug 2

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
CountryItaly
CityFlorence
Period19/7/2819/8/2

Bibliographical note

Funding Information:
This work is supported by Microsoft Research Asia and IITP grant funded by the Korean government (MSIT, 2017-0-01779, XAI). Hwang is a corresponding author.

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science(all)
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Soft representation learning for sparse transfer'. Together they form a unique fingerprint.

Cite this