A study on acoustic parameter selection strategies to improve deep learning-based speech synthesis

Hyeonjoo Kang, Young Sun Joo, Inseon Jang, Chunghyun Ahn, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in low-resource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a context-window approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups.

Original languageEnglish
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages618-622
Number of pages5
ISBN (Electronic)9781728132488
DOIs
Publication statusPublished - 2019 Nov
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: 2019 Nov 182019 Nov 21

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
CountryChina
CityLanzhou
Period19/11/1819/11/21

Bibliographical note

Funding Information:
This work was supported by Institute for Information communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2019-0-00447, Development

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint Dive into the research topics of 'A study on acoustic parameter selection strategies to improve deep learning-based speech synthesis'. Together they form a unique fingerprint.

Cite this