In this paper, we propose a new speech reconstruction method from spectral envelope and pitch intervals, which is applicable to the network side of a distributed speech recognition system as a play-back function. The spectral envelope of speech is represented as a set of mel-frequency cepstral coefficients that is a well-known recognition parameter. First, a sinusoidal synthesis with a zero-phase model is used to obtain a pitch-based waveform. To enhance the naturalness of the speech we replace the zero phase information with pre-stored linear and random codebooks. The ultimate phase information is determined depending on the energy ratio between linear and random components. Unlike the classic low bit-rate speech coding, however, the energy ratio is estimated in the decoding stage from a time-frequency filter applied to the pitch-based synthesized signal. Thus, the phase information is not a feature parameter from the encoder side. The proposed phase generation method uses the knowledge that pitch variation is a main cause of the mixed characteristics in speech signals. An informal listening test verifies that the quality of the proposed method is much better than that of the synthetic quality.
|Journal||ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings|
|Publication status||Published - 2002 Jul 11|
|Event||2002 IEEE International Conference on Acustics, Speech, and Signal Processing - Orlando, FL, United States|
Duration: 2002 May 13 → 2002 May 17
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering