A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system

Jin Seob Kim, Young Sun Joo, Hong Goo Kang, Inseon Jang, Chunghyun Ahn, Jeongil Seo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages408-411
Number of pages4
ISBN (Electronic)9781509041657
DOIs
Publication statusPublished - 2016 Jul 2
Event2016 IEEE International Conference on Digital Signal Processing, DSP 2016 - Beijing, China
Duration: 2016 Oct 162016 Oct 18

Publication series

NameInternational Conference on Digital Signal Processing, DSP
Volume0

Other

Other2016 IEEE International Conference on Digital Signal Processing, DSP 2016
CountryChina
CityBeijing
Period16/10/1616/10/18

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Signal Processing

Cite this

Kim, J. S., Joo, Y. S., Kang, H. G., Jang, I., Ahn, C., & Seo, J. (2016). A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system. In Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016 (pp. 408-411). [7868589] (International Conference on Digital Signal Processing, DSP; Vol. 0). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDSP.2016.7868589