Abstract
In this paper, we propose a method to effectively determine the representative style embedding of each emotion class to improve the global style token-based end-to-end speech synthesis system. The emotion expressiveness of conventional approach was limited because it utilized only one style representative per each emotion. We overcome the problem by extracting multiple number of representatives per each emotion using a k-means clustering algorithm. Through the results of listening tests, it is proved that the proposed method clearly express each emotion while distinguishing one emotion from others.
Original language | English |
---|---|
Pages (from-to) | 614-620 |
Number of pages | 7 |
Journal | Journal of the Acoustical Society of Korea |
Volume | 38 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2019 |
Bibliographical note
Publisher Copyright:© 2019 Acoustical Society of Korea. All rights reserved.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Instrumentation
- Acoustics and Ultrasonics
- Applied Mathematics
- Speech and Hearing