Modeling-By-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System

Min Jae Hwang, Eunwoo Song, Kyungguen Byun, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper proposes a novel noise compensation algorithm for a glottal excitation model in a deep learning (DL)-based speech synthesis system. To generate high-quality speech synthesis outputs, the balance between harmonic and noise components of the glottal excitation signal should be well-represented by the DL network. However, it is hard to accurately model the noise component because the DL training process inevitably results in statistically smoothed outputs; thus, it is essential to introduce an additional noise compensation process. We propose a modeling-by-generation structure-based noise compensation method that the missing noise component in the generated glottal signal is directly extracted and parameterized during the entire training process. By modeling the noise component using the additional DL network, the proposed system successfully compensates the missing noise component. Objective and subjective test results confirm that the synthesized speech with the proposed noise compensation method is superior to that with conventional methods.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5669-5673
Number of pages5
ISBN (Print)9781538646588
DOIs
Publication statusPublished - 2018 Sep 10
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 2018 Apr 152018 Apr 20

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Conference

Conference2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period18/4/1518/4/20

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Hwang, M. J., Song, E., Byun, K., & Kang, H. G. (2018). Modeling-By-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (pp. 5669-5673). [8461606] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461606