Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

Eunwoo Song, Hong-Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layerbased MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.

Original languageEnglish
Title of host publication2016 24th European Signal Processing Conference, EUSIPCO 2016
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages1951-1955
Number of pages5
Volume2016-November
ISBN (Electronic)9780992862657
DOIs
Publication statusPublished - 2016 Nov 28
Event24th European Signal Processing Conference, EUSIPCO 2016 - Budapest, Hungary
Duration: 2016 Aug 282016 Sep 2

Other

Other24th European Signal Processing Conference, EUSIPCO 2016
CountryHungary
CityBudapest
Period16/8/2816/9/2

Fingerprint

Speech synthesis
Learning algorithms
Speech analysis
Clustering algorithms
Deep neural networks

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Song, E., & Kang, H-G. (2016). Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis. In 2016 24th European Signal Processing Conference, EUSIPCO 2016 (Vol. 2016-November, pp. 1951-1955). [7760589] European Signal Processing Conference, EUSIPCO. https://doi.org/10.1109/EUSIPCO.2016.7760589
Song, Eunwoo ; Kang, Hong-Goo. / Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis. 2016 24th European Signal Processing Conference, EUSIPCO 2016. Vol. 2016-November European Signal Processing Conference, EUSIPCO, 2016. pp. 1951-1955
@inproceedings{e91aa759e54b444ba4c166d26d86bc5c,
title = "Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis",
abstract = "This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layerbased MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.",
author = "Eunwoo Song and Hong-Goo Kang",
year = "2016",
month = "11",
day = "28",
doi = "10.1109/EUSIPCO.2016.7760589",
language = "English",
volume = "2016-November",
pages = "1951--1955",
booktitle = "2016 24th European Signal Processing Conference, EUSIPCO 2016",
publisher = "European Signal Processing Conference, EUSIPCO",

}

Song, E & Kang, H-G 2016, Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis. in 2016 24th European Signal Processing Conference, EUSIPCO 2016. vol. 2016-November, 7760589, European Signal Processing Conference, EUSIPCO, pp. 1951-1955, 24th European Signal Processing Conference, EUSIPCO 2016, Budapest, Hungary, 16/8/28. https://doi.org/10.1109/EUSIPCO.2016.7760589

Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis. / Song, Eunwoo; Kang, Hong-Goo.

2016 24th European Signal Processing Conference, EUSIPCO 2016. Vol. 2016-November European Signal Processing Conference, EUSIPCO, 2016. p. 1951-1955 7760589.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

AU - Song, Eunwoo

AU - Kang, Hong-Goo

PY - 2016/11/28

Y1 - 2016/11/28

N2 - This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layerbased MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.

AB - This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layerbased MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.

UR - http://www.scopus.com/inward/record.url?scp=85005951086&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85005951086&partnerID=8YFLogxK

U2 - 10.1109/EUSIPCO.2016.7760589

DO - 10.1109/EUSIPCO.2016.7760589

M3 - Conference contribution

VL - 2016-November

SP - 1951

EP - 1955

BT - 2016 24th European Signal Processing Conference, EUSIPCO 2016

PB - European Signal Processing Conference, EUSIPCO

ER -

Song E, Kang H-G. Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis. In 2016 24th European Signal Processing Conference, EUSIPCO 2016. Vol. 2016-November. European Signal Processing Conference, EUSIPCO. 2016. p. 1951-1955. 7760589 https://doi.org/10.1109/EUSIPCO.2016.7760589