Intra-class variation reduction of speaker representation in disentanglement framework

Yoohwan Kwon, Soo Whan Chung, Hong Goo Kang

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

In this paper, we propose an effective training strategy to extract robust speaker representations from a speech signal. One of the key challenges in speaker recognition tasks is to learn latent representations or embeddings containing solely speaker characteristic information in order to be robust in terms of intraspeaker variations. By modifying the network architecture to generate both speaker-related and speaker-unrelated representations, we exploit a learning criterion which minimizes the mutual information between these disentangled embeddings. We also introduce an identity change loss criterion which utilizes a reconstruction error to different utterances spoken by the same speaker. Since the proposed criteria reduce the variation of speaker characteristics caused by changes in background environment or spoken content, the resulting embeddings of each speaker become more consistent. The effectiveness of the proposed method is demonstrated through two tasks; disentanglement performance, and improvement of speaker recognition accuracy compared to the baseline model on a benchmark dataset, VoxCeleb1. Ablation studies also show the impact of each criterion on overall performance.

Original languageEnglish
Pages (from-to)3231-3235
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
DOIs
Publication statusPublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 2020 Oct 252020 Oct 29

Bibliographical note

Funding Information:
Acknowledgements. This research is sponsored by Naver Corporation.

Publisher Copyright:
Copyright © 2020 ISCA

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Intra-class variation reduction of speaker representation in disentanglement framework'. Together they form a unique fingerprint.

Cite this