A phase generation method for speech reconstruction from spectral envelope and pitch intervals

Hong-Goo Kang, Hong Kook Kim

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

In this paper, we propose a new speech reconstruction method from spectral envelope and pitch intervals, which is applicable to the network side of a distributed speech recognition system as a play-back function. The spectral envelope of speech is represented as a set of mel-frequency cepstral coefficients that is a well-known recognition parameter. First, a sinusoidal synthesis with a zero-phase model is used to obtain a pitch-based waveform. To enhance the naturalness of the speech we replace the zero phase information with pre-stored linear and random codebooks. The ultimate phase information is determined depending on the energy ratio between linear and random components. Unlike the classic low bit-rate speech coding, however, the energy ratio is estimated in the decoding stage from a time-frequency filter applied to the pitch-based synthesized signal. Thus, the phase information is not a feature parameter from the encoder side. The proposed phase generation method uses the knowledge that pitch variation is a main cause of the mixed characteristics in speech signals. An informal listening test verifies that the quality of the proposed method is much better than that of the synthetic quality.

Original languageEnglish
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
Publication statusPublished - 2002 Jul 11
Event2002 IEEE International Conference on Acustics, Speech, and Signal Processing - Orlando, FL, United States
Duration: 2002 May 132002 May 17

Fingerprint

Speech coding
Speech recognition
Decoding

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{b0852861f6e145f1a5c05eaf0efc84cf,
title = "A phase generation method for speech reconstruction from spectral envelope and pitch intervals",
abstract = "In this paper, we propose a new speech reconstruction method from spectral envelope and pitch intervals, which is applicable to the network side of a distributed speech recognition system as a play-back function. The spectral envelope of speech is represented as a set of mel-frequency cepstral coefficients that is a well-known recognition parameter. First, a sinusoidal synthesis with a zero-phase model is used to obtain a pitch-based waveform. To enhance the naturalness of the speech we replace the zero phase information with pre-stored linear and random codebooks. The ultimate phase information is determined depending on the energy ratio between linear and random components. Unlike the classic low bit-rate speech coding, however, the energy ratio is estimated in the decoding stage from a time-frequency filter applied to the pitch-based synthesized signal. Thus, the phase information is not a feature parameter from the encoder side. The proposed phase generation method uses the knowledge that pitch variation is a main cause of the mixed characteristics in speech signals. An informal listening test verifies that the quality of the proposed method is much better than that of the synthetic quality.",
author = "Hong-Goo Kang and Kim, {Hong Kook}",
year = "2002",
month = "7",
day = "11",
language = "English",
volume = "1",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A phase generation method for speech reconstruction from spectral envelope and pitch intervals

AU - Kang, Hong-Goo

AU - Kim, Hong Kook

PY - 2002/7/11

Y1 - 2002/7/11

N2 - In this paper, we propose a new speech reconstruction method from spectral envelope and pitch intervals, which is applicable to the network side of a distributed speech recognition system as a play-back function. The spectral envelope of speech is represented as a set of mel-frequency cepstral coefficients that is a well-known recognition parameter. First, a sinusoidal synthesis with a zero-phase model is used to obtain a pitch-based waveform. To enhance the naturalness of the speech we replace the zero phase information with pre-stored linear and random codebooks. The ultimate phase information is determined depending on the energy ratio between linear and random components. Unlike the classic low bit-rate speech coding, however, the energy ratio is estimated in the decoding stage from a time-frequency filter applied to the pitch-based synthesized signal. Thus, the phase information is not a feature parameter from the encoder side. The proposed phase generation method uses the knowledge that pitch variation is a main cause of the mixed characteristics in speech signals. An informal listening test verifies that the quality of the proposed method is much better than that of the synthetic quality.

AB - In this paper, we propose a new speech reconstruction method from spectral envelope and pitch intervals, which is applicable to the network side of a distributed speech recognition system as a play-back function. The spectral envelope of speech is represented as a set of mel-frequency cepstral coefficients that is a well-known recognition parameter. First, a sinusoidal synthesis with a zero-phase model is used to obtain a pitch-based waveform. To enhance the naturalness of the speech we replace the zero phase information with pre-stored linear and random codebooks. The ultimate phase information is determined depending on the energy ratio between linear and random components. Unlike the classic low bit-rate speech coding, however, the energy ratio is estimated in the decoding stage from a time-frequency filter applied to the pitch-based synthesized signal. Thus, the phase information is not a feature parameter from the encoder side. The proposed phase generation method uses the knowledge that pitch variation is a main cause of the mixed characteristics in speech signals. An informal listening test verifies that the quality of the proposed method is much better than that of the synthetic quality.

UR - http://www.scopus.com/inward/record.url?scp=0036295939&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036295939&partnerID=8YFLogxK

M3 - Conference article

VL - 1

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -