Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments

Hong Kook Kim, Richard C. Rose, Hong-Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper presents a set of acoustic feature pre-processing techniques that are applied to improving automatic speech recognition (ASR) performance on the Aurora 2 noisy speech recognition task. The principal contribution of this paper is an approach for cepstrum domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed for noisy speech enhancement. This approach is applied in combination with other feature compensation algorithms to compensating ASR features obtained from a mel-filterbank cepstrum coefficient (MFCC) front-end. Performance comparisons are made with respect to the application of the minimum mean squared error log spectral amplitude estimator (MMSE-LSA) based speech enhancement algorithm prior to feature analysis. An experimental study is presented where the feature compensation approaches described in the paper are found to reduce ASR word error rate by as much as 31% relative to uncompensated features under simulated environmental and channel mismatched conditions.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
EditorsBorge Lindberg, Henrik Benner, Paul Dalsgaard, Zheng-Hua Tan
PublisherInternational Speech Communication Association
Pages421-424
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
Publication statusPublished - 2001 Jan 1
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 2001 Sep 32001 Sep 7

Other

Other7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period01/9/301/9/7

Fingerprint

Speech recognition
Acoustic noise
acoustics
Acoustics
Decomposition
Speech enhancement
performance comparison
Compensation and Redress
Processing
performance

All Science Journal Classification (ASJC) codes

  • Communication
  • Linguistics and Language
  • Computer Science Applications
  • Software

Cite this

Kim, H. K., Rose, R. C., & Kang, H-G. (2001). Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. In B. Lindberg, H. Benner, P. Dalsgaard, & Z-H. Tan (Eds.), EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology (pp. 421-424). International Speech Communication Association.
Kim, Hong Kook ; Rose, Richard C. ; Kang, Hong-Goo. / Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. editor / Borge Lindberg ; Henrik Benner ; Paul Dalsgaard ; Zheng-Hua Tan. International Speech Communication Association, 2001. pp. 421-424
@inproceedings{361d68b25c4b4ad2981b0a88d32bb4dd,
title = "Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments",
abstract = "This paper presents a set of acoustic feature pre-processing techniques that are applied to improving automatic speech recognition (ASR) performance on the Aurora 2 noisy speech recognition task. The principal contribution of this paper is an approach for cepstrum domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed for noisy speech enhancement. This approach is applied in combination with other feature compensation algorithms to compensating ASR features obtained from a mel-filterbank cepstrum coefficient (MFCC) front-end. Performance comparisons are made with respect to the application of the minimum mean squared error log spectral amplitude estimator (MMSE-LSA) based speech enhancement algorithm prior to feature analysis. An experimental study is presented where the feature compensation approaches described in the paper are found to reduce ASR word error rate by as much as 31{\%} relative to uncompensated features under simulated environmental and channel mismatched conditions.",
author = "Kim, {Hong Kook} and Rose, {Richard C.} and Hong-Goo Kang",
year = "2001",
month = "1",
day = "1",
language = "English",
pages = "421--424",
editor = "Borge Lindberg and Henrik Benner and Paul Dalsgaard and Zheng-Hua Tan",
booktitle = "EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology",
publisher = "International Speech Communication Association",

}

Kim, HK, Rose, RC & Kang, H-G 2001, Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. in B Lindberg, H Benner, P Dalsgaard & Z-H Tan (eds), EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 421-424, 7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001, Aalborg, Denmark, 01/9/3.

Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. / Kim, Hong Kook; Rose, Richard C.; Kang, Hong-Goo.

EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. ed. / Borge Lindberg; Henrik Benner; Paul Dalsgaard; Zheng-Hua Tan. International Speech Communication Association, 2001. p. 421-424.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments

AU - Kim, Hong Kook

AU - Rose, Richard C.

AU - Kang, Hong-Goo

PY - 2001/1/1

Y1 - 2001/1/1

N2 - This paper presents a set of acoustic feature pre-processing techniques that are applied to improving automatic speech recognition (ASR) performance on the Aurora 2 noisy speech recognition task. The principal contribution of this paper is an approach for cepstrum domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed for noisy speech enhancement. This approach is applied in combination with other feature compensation algorithms to compensating ASR features obtained from a mel-filterbank cepstrum coefficient (MFCC) front-end. Performance comparisons are made with respect to the application of the minimum mean squared error log spectral amplitude estimator (MMSE-LSA) based speech enhancement algorithm prior to feature analysis. An experimental study is presented where the feature compensation approaches described in the paper are found to reduce ASR word error rate by as much as 31% relative to uncompensated features under simulated environmental and channel mismatched conditions.

AB - This paper presents a set of acoustic feature pre-processing techniques that are applied to improving automatic speech recognition (ASR) performance on the Aurora 2 noisy speech recognition task. The principal contribution of this paper is an approach for cepstrum domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed for noisy speech enhancement. This approach is applied in combination with other feature compensation algorithms to compensating ASR features obtained from a mel-filterbank cepstrum coefficient (MFCC) front-end. Performance comparisons are made with respect to the application of the minimum mean squared error log spectral amplitude estimator (MMSE-LSA) based speech enhancement algorithm prior to feature analysis. An experimental study is presented where the feature compensation approaches described in the paper are found to reduce ASR word error rate by as much as 31% relative to uncompensated features under simulated environmental and channel mismatched conditions.

UR - http://www.scopus.com/inward/record.url?scp=85009124851&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009124851&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009124851

SP - 421

EP - 424

BT - EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

A2 - Lindberg, Borge

A2 - Benner, Henrik

A2 - Dalsgaard, Paul

A2 - Tan, Zheng-Hua

PB - International Speech Communication Association

ER -

Kim HK, Rose RC, Kang H-G. Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. In Lindberg B, Benner H, Dalsgaard P, Tan Z-H, editors, EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association. 2001. p. 421-424