This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional variation problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.
|Title of host publication||2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||5|
|Publication status||Published - 2015 Aug 4|
|Event||40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia|
Duration: 2014 Apr 19 → 2014 Apr 24
|Name||ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings|
|Other||40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015|
|Period||14/4/19 → 14/4/24|
Bibliographical notePublisher Copyright:
© 2015 IEEE.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering