A study on acoustic parameter selection strategies to improve deep learning-based speech synthesis

Hyeonjoo Kang, Young Sun Joo, Inseon Jang, Chunghyun Ahn, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in low-resource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a context-window approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups.

Original languageEnglish
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages618-622
Number of pages5
ISBN (Electronic)9781728132488
DOIs
Publication statusPublished - 2019 Nov
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: 2019 Nov 182019 Nov 21

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
CountryChina
CityLanzhou
Period19/11/1819/11/21

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint Dive into the research topics of 'A study on acoustic parameter selection strategies to improve deep learning-based speech synthesis'. Together they form a unique fingerprint.

  • Cite this

    Kang, H., Joo, Y. S., Jang, I., Ahn, C., & Kang, H. G. (2019). A study on acoustic parameter selection strategies to improve deep learning-based speech synthesis. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 (pp. 618-622). [9023146] (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPAASC47483.2019.9023146