A Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Huu Kim Nguyen, Kihyuk Jeong, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present a fast and lightweight speech synthesis model that is suitable for on-device applications. By leveraging the techniques of long-short range attention, depth-wise separable convolution, and linear attention, we significantly reduce the model size and complexity of the baseline FastSpeech2-based Transformer framework. Unlike the baseline model that requires O(N^{2}) to compute attention and convolution operations because of nested-loop computations, our proposed model only requires O(N) computations due to the modification of a nested-loop into two cascaded single loops. Experimental results show that our proposed model is able to generate speech with a real-time factor of 0.26 and requires only 10.4 million parameters. Despite the reduction in model size and complexity, still, the generated speech quality of our model is nearly close to the baseline.

Original languageEnglish
Title of host publication2021 36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665435536
DOIs
Publication statusPublished - 2021 Jun 27
Event36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021 - Jeju, Korea, Republic of
Duration: 2021 Jun 272021 Jun 30

Publication series

Name2021 36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021

Conference

Conference36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021
Country/TerritoryKorea, Republic of
CityJeju
Period21/6/2721/6/30

Bibliographical note

Funding Information:
ACKNOWLEDGMENT The work is supported by Clova Voice, NAVER Corp., Seongnam, Korea.

Publisher Copyright:
© 2021 IEEE.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Fast and Lightweight Speech Synthesis Model based on FastSpeech2'. Together they form a unique fingerprint.

Cite this