Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework

Ohsung Kwon, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style information is presented by an additional style embedding vector. Although the GST is not a new idea, no one has been utilized the idea for an emotional speech synthesis task. We explicitly combine the GST idea with the Tacotron2 framework to implement an emotional text-to-speech system. The analysis results demonstrate that the proposed GST structure successfully transfers various types of emotional information to the synthesized speech. Subjective listening tests to evaluate the naturalness and emotional expression of synthesized speech are conducted to verify the superiority of the proposed algorithm.

Original languageEnglish
Title of host publication34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728132716
DOIs
Publication statusPublished - 2019 Jun 1
Event34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019 - JeJu, Korea, Republic of
Duration: 2019 Jun 232019 Jun 26

Publication series

Name34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019

Conference

Conference34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019
CountryKorea, Republic of
CityJeJu
Period19/6/2319/6/26

Fingerprint

Speech synthesis

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Kwon, O., Jang, I., Ahn, C., & Kang, H-G. (2019). Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework. In 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019 [8793393] (34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ITC-CSCC.2019.8793393
Kwon, Ohsung ; Jang, Inseon ; Ahn, Chunghyun ; Kang, Hong-Goo. / Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework. 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. (34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019).
@inproceedings{ac80267da6e04103b408d84fe086991b,
title = "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework",
abstract = "In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style information is presented by an additional style embedding vector. Although the GST is not a new idea, no one has been utilized the idea for an emotional speech synthesis task. We explicitly combine the GST idea with the Tacotron2 framework to implement an emotional text-to-speech system. The analysis results demonstrate that the proposed GST structure successfully transfers various types of emotional information to the synthesized speech. Subjective listening tests to evaluate the naturalness and emotional expression of synthesized speech are conducted to verify the superiority of the proposed algorithm.",
author = "Ohsung Kwon and Inseon Jang and Chunghyun Ahn and Hong-Goo Kang",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/ITC-CSCC.2019.8793393",
language = "English",
series = "34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019",
address = "United States",

}

Kwon, O, Jang, I, Ahn, C & Kang, H-G 2019, Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework. in 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019., 8793393, 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019, Institute of Electrical and Electronics Engineers Inc., 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019, JeJu, Korea, Republic of, 19/6/23. https://doi.org/10.1109/ITC-CSCC.2019.8793393

Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework. / Kwon, Ohsung; Jang, Inseon; Ahn, Chunghyun; Kang, Hong-Goo.

34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 8793393 (34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework

AU - Kwon, Ohsung

AU - Jang, Inseon

AU - Ahn, Chunghyun

AU - Kang, Hong-Goo

PY - 2019/6/1

Y1 - 2019/6/1

N2 - In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style information is presented by an additional style embedding vector. Although the GST is not a new idea, no one has been utilized the idea for an emotional speech synthesis task. We explicitly combine the GST idea with the Tacotron2 framework to implement an emotional text-to-speech system. The analysis results demonstrate that the proposed GST structure successfully transfers various types of emotional information to the synthesized speech. Subjective listening tests to evaluate the naturalness and emotional expression of synthesized speech are conducted to verify the superiority of the proposed algorithm.

AB - In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style information is presented by an additional style embedding vector. Although the GST is not a new idea, no one has been utilized the idea for an emotional speech synthesis task. We explicitly combine the GST idea with the Tacotron2 framework to implement an emotional text-to-speech system. The analysis results demonstrate that the proposed GST structure successfully transfers various types of emotional information to the synthesized speech. Subjective listening tests to evaluate the naturalness and emotional expression of synthesized speech are conducted to verify the superiority of the proposed algorithm.

UR - http://www.scopus.com/inward/record.url?scp=85071500622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071500622&partnerID=8YFLogxK

U2 - 10.1109/ITC-CSCC.2019.8793393

DO - 10.1109/ITC-CSCC.2019.8793393

M3 - Conference contribution

AN - SCOPUS:85071500622

T3 - 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019

BT - 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Kwon O, Jang I, Ahn C, Kang H-G. Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework. In 34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 8793393. (34th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2019). https://doi.org/10.1109/ITC-CSCC.2019.8793393