A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system

Jin Seob Kim, Young Sun Joo, Hong Goo Kang, Inseon Jang, Chunghyun Ahn, Jeongil Seo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages408-411
Number of pages4
ISBN (Electronic)9781509041657
DOIs
Publication statusPublished - 2016 Jul 2
Event2016 IEEE International Conference on Digital Signal Processing, DSP 2016 - Beijing, China
Duration: 2016 Oct 162016 Oct 18

Publication series

NameInternational Conference on Digital Signal Processing, DSP
Volume0

Other

Other2016 IEEE International Conference on Digital Signal Processing, DSP 2016
CountryChina
CityBeijing
Period16/10/1616/10/18

Fingerprint

Speech synthesis
Speech analysis
Deep neural networks

All Science Journal Classification (ASJC) codes

  • Signal Processing

Cite this

Kim, J. S., Joo, Y. S., Kang, H. G., Jang, I., Ahn, C., & Seo, J. (2016). A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system. In Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016 (pp. 408-411). [7868589] (International Conference on Digital Signal Processing, DSP; Vol. 0). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDSP.2016.7868589
Kim, Jin Seob ; Joo, Young Sun ; Kang, Hong Goo ; Jang, Inseon ; Ahn, Chunghyun ; Seo, Jeongil. / A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system. Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 408-411 (International Conference on Digital Signal Processing, DSP).
@inproceedings{8d41fe4360364cf6b477326afa78e857,
title = "A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system",
abstract = "This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.",
author = "Kim, {Jin Seob} and Joo, {Young Sun} and Kang, {Hong Goo} and Inseon Jang and Chunghyun Ahn and Jeongil Seo",
year = "2016",
month = "7",
day = "2",
doi = "10.1109/ICDSP.2016.7868589",
language = "English",
series = "International Conference on Digital Signal Processing, DSP",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "408--411",
booktitle = "Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016",
address = "United States",

}

Kim, JS, Joo, YS, Kang, HG, Jang, I, Ahn, C & Seo, J 2016, A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system. in Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016., 7868589, International Conference on Digital Signal Processing, DSP, vol. 0, Institute of Electrical and Electronics Engineers Inc., pp. 408-411, 2016 IEEE International Conference on Digital Signal Processing, DSP 2016, Beijing, China, 16/10/16. https://doi.org/10.1109/ICDSP.2016.7868589

A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system. / Kim, Jin Seob; Joo, Young Sun; Kang, Hong Goo; Jang, Inseon; Ahn, Chunghyun; Seo, Jeongil.

Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 408-411 7868589 (International Conference on Digital Signal Processing, DSP; Vol. 0).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system

AU - Kim, Jin Seob

AU - Joo, Young Sun

AU - Kang, Hong Goo

AU - Jang, Inseon

AU - Ahn, Chunghyun

AU - Seo, Jeongil

PY - 2016/7/2

Y1 - 2016/7/2

N2 - This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.

AB - This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.

UR - http://www.scopus.com/inward/record.url?scp=85016224967&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016224967&partnerID=8YFLogxK

U2 - 10.1109/ICDSP.2016.7868589

DO - 10.1109/ICDSP.2016.7868589

M3 - Conference contribution

AN - SCOPUS:85016224967

T3 - International Conference on Digital Signal Processing, DSP

SP - 408

EP - 411

BT - Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Kim JS, Joo YS, Kang HG, Jang I, Ahn C, Seo J. A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system. In Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 408-411. 7868589. (International Conference on Digital Signal Processing, DSP). https://doi.org/10.1109/ICDSP.2016.7868589