We investigate an improved time-frequency trajectory excitation (ITFTE) vocoder for deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) systems. The ITFTE is a linear predictive coding-based vocoder, where a pitch-dependent excitation signal is represented by a periodicity distribution in a time-frequency domain. The proposed method significantly improves the parameterization efficiency of ITFTE vocoder for the DNN-based SPSS system, even if its dimension changes due to the inherent nature of pitch variation. By utilizing an orthogonality property of discrete cosine transform, we not only accurately reconstruct the ITFTE parameters but also improve the perceptual quality of synthesized speech. Objective and subjective test results confirm that the proposed method provides superior synthesized speech compared to the previous system.
|Number of pages||5|
|Journal||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|Publication status||Published - 2016|
|Event||17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States|
Duration: 2016 Sep 8 → 2016 Sep 16
Bibliographical notePublisher Copyright:
Copyright © 2016 ISCA.
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Modelling and Simulation