This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layerbased MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.
|Title of host publication||2016 24th European Signal Processing Conference, EUSIPCO 2016|
|Publisher||European Signal Processing Conference, EUSIPCO|
|Number of pages||5|
|Publication status||Published - 2016 Nov 28|
|Event||24th European Signal Processing Conference, EUSIPCO 2016 - Budapest, Hungary|
Duration: 2016 Aug 28 → 2016 Sep 2
|Name||European Signal Processing Conference|
|Other||24th European Signal Processing Conference, EUSIPCO 2016|
|Period||16/8/28 → 16/9/2|
Bibliographical notePublisher Copyright:
© 2016 IEEE.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering