Efficient deep neural networks for speech synthesis using bottleneck features

Young Sun Joo, Won Suk Jun, Hong-Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database is inadequate. The bottle-neck feature utilized in the proposed approach represents the characteristics of linguistic features and its average acoustic features of several speakers. Therefore, it is more efficient to learn a mapping rule between bottleneck and acoustic features than to learn directly a mapping rule between linguistic and acoustic features. Experimental results show that the learning capability of the proposed structure is much higher than that of the conventional structures. Objective and subjective listening test results also verify the superiority of the proposed structure.

Original languageEnglish
Title of host publication2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9789881476821
DOIs
Publication statusPublished - 2017 Jan 17
Event2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, Korea, Republic of
Duration: 2016 Dec 132016 Dec 16

Other

Other2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
CountryKorea, Republic of
CityJeju
Period16/12/1316/12/16

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Signal Processing

Cite this

Joo, Y. S., Jun, W. S., & Kang, H-G. (2017). Efficient deep neural networks for speech synthesis using bottleneck features. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 [7820721] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2016.7820721