ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

Eunwoo Song, Kyungguen Byun, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a WaveNet-based neural excitation model (ExcitNet) for statistical parametric speech synthesis systems. Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework. However, they often suffer from noisy outputs because of the difficulties in capturing the complicated time-varying nature of speech signals. To improve modeling efficiency, the proposed ExcitNet vocoder employs an adaptive inverse filter to decouple spectral components from the speech signal. The residual component, i.e. excitation signal, is then trained and generated within the WaveNet framework. In this way, the quality of the synthesized speech signal can be further improved since the spectral component is well represented by a deep learning framework and, moreover, the residual component is efficiently generated by the WaveNet framework. Experimental results show that the proposed ExcitNet vocoder, trained both speaker-dependently and speaker-independently, outperforms traditional linear prediction vocoders and similarly configured conventional WaveNet vocoders.

Original languageEnglish
Title of host publicationEUSIPCO 2019 - 27th European Signal Processing Conference
PublisherEuropean Signal Processing Conference, EUSIPCO
ISBN (Electronic)9789082797039
DOIs
Publication statusPublished - 2019 Sep
Event27th European Signal Processing Conference, EUSIPCO 2019 - A Coruna, Spain
Duration: 2019 Sep 22019 Sep 6

Publication series

NameEuropean Signal Processing Conference
Volume2019-September
ISSN (Print)2219-5491

Conference

Conference27th European Signal Processing Conference, EUSIPCO 2019
CountrySpain
CityA Coruna
Period19/9/219/9/6

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems'. Together they form a unique fingerprint.

  • Cite this

    Song, E., Byun, K., & Kang, H. G. (2019). ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems. In EUSIPCO 2019 - 27th European Signal Processing Conference (European Signal Processing Conference; Vol. 2019-September). European Signal Processing Conference, EUSIPCO. https://doi.org/10.23919/EUSIPCO.2019.8902701