Abstract
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN). The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis systems by combining a vocal tract LP filter with a WaveRNN-based vocal source (i.e., excitation) generator. However, the quality of synthesized speech is often unstable because the vocal source component is insufficiently represented by the μ-law quantization method, and the model is trained without considering the entire speech production mechanism. To address this problem, we first introduce LP-MDN, which enables the autoregressive neural vocoder to structurally represent the interactions between the vocal tract and vocal source components. Then, we propose to incorporate the LP-MDN to the LPCNet vocoder by replacing the conventional discretized output with continuous density distribution. The experimental results verify that the proposed system provides high quality synthetic speech by achieving a mean opinion score of 4.41 within a text-to-speech framework.
Original language | English |
---|---|
Title of host publication | 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 7219-7223 |
Number of pages | 5 |
ISBN (Electronic) | 9781509066315 |
DOIs | |
Publication status | Published - 2020 May |
Event | 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain Duration: 2020 May 4 → 2020 May 8 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2020-May |
ISSN (Print) | 1520-6149 |
Conference
Conference | 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 20/5/4 → 20/5/8 |
Bibliographical note
Funding Information:The work was supported by Clova Voice, NAVER Corp., Seongnam, Korea.
Publisher Copyright:
© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Electrical and Electronic Engineering