Deep bi-directional long short-term memory based speech enhancement for wind noise reduction

Jinkyu Lee, Keulbit Kim, Turaj Shabestary, Hong-Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this paper, we propose a new recurrent neural network (RNN)-based single-channel speech enhancement framework for off-line wind noise reduction. To adequately represent highly non-stationary characteristics of wind noise, we first adopt a deep bi-directional long short-term memory (DBLSTM) structure. However, its enhanced output becomes muffled due to the spectral over-smoothing effect. To overcome this problem, we propose a new structure of DBLSTM-based speech enhancement system that internally incorporates the speech and noise power estimation processes in the spectral filtering framework. Furthermore, we propose a structure with an additional internal constraint of minimizing log a priori SNR, which provides efficient learning mechanism. Experimental results show that the proposed method improves source-to-distortion ratio (SDR) by 6.9 dB and perceptual evaluation of speech quality (PESQ) by 0.24 in comparison to the conventional DBLSTM-based system.

Original languageEnglish
Title of host publication2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages41-45
Number of pages5
ISBN (Electronic)9781509059256
DOIs
Publication statusPublished - 2017 Apr 10
Event2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - San Francisco, United States
Duration: 2017 Mar 12017 Mar 3

Publication series

Name2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings

Other

Other2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017
CountryUnited States
CitySan Francisco
Period17/3/117/3/3

Fingerprint

Speech enhancement
Noise abatement
noise reduction
augmentation
Recurrent neural networks
smoothing
neural network
learning
Long short-term memory
evaluation
output

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Acoustics and Ultrasonics
  • Instrumentation
  • Communication

Cite this

Lee, J., Kim, K., Shabestary, T., & Kang, H-G. (2017). Deep bi-directional long short-term memory based speech enhancement for wind noise reduction. In 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings (pp. 41-45). [7895558] (2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HSCMA.2017.7895558
Lee, Jinkyu ; Kim, Keulbit ; Shabestary, Turaj ; Kang, Hong-Goo. / Deep bi-directional long short-term memory based speech enhancement for wind noise reduction. 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 41-45 (2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings).
@inproceedings{84645b21c83842339c4666bec3664760,
title = "Deep bi-directional long short-term memory based speech enhancement for wind noise reduction",
abstract = "In this paper, we propose a new recurrent neural network (RNN)-based single-channel speech enhancement framework for off-line wind noise reduction. To adequately represent highly non-stationary characteristics of wind noise, we first adopt a deep bi-directional long short-term memory (DBLSTM) structure. However, its enhanced output becomes muffled due to the spectral over-smoothing effect. To overcome this problem, we propose a new structure of DBLSTM-based speech enhancement system that internally incorporates the speech and noise power estimation processes in the spectral filtering framework. Furthermore, we propose a structure with an additional internal constraint of minimizing log a priori SNR, which provides efficient learning mechanism. Experimental results show that the proposed method improves source-to-distortion ratio (SDR) by 6.9 dB and perceptual evaluation of speech quality (PESQ) by 0.24 in comparison to the conventional DBLSTM-based system.",
author = "Jinkyu Lee and Keulbit Kim and Turaj Shabestary and Hong-Goo Kang",
year = "2017",
month = "4",
day = "10",
doi = "10.1109/HSCMA.2017.7895558",
language = "English",
series = "2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "41--45",
booktitle = "2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings",
address = "United States",

}

Lee, J, Kim, K, Shabestary, T & Kang, H-G 2017, Deep bi-directional long short-term memory based speech enhancement for wind noise reduction. in 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings., 7895558, 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 41-45, 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017, San Francisco, United States, 17/3/1. https://doi.org/10.1109/HSCMA.2017.7895558

Deep bi-directional long short-term memory based speech enhancement for wind noise reduction. / Lee, Jinkyu; Kim, Keulbit; Shabestary, Turaj; Kang, Hong-Goo.

2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 41-45 7895558 (2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Deep bi-directional long short-term memory based speech enhancement for wind noise reduction

AU - Lee, Jinkyu

AU - Kim, Keulbit

AU - Shabestary, Turaj

AU - Kang, Hong-Goo

PY - 2017/4/10

Y1 - 2017/4/10

N2 - In this paper, we propose a new recurrent neural network (RNN)-based single-channel speech enhancement framework for off-line wind noise reduction. To adequately represent highly non-stationary characteristics of wind noise, we first adopt a deep bi-directional long short-term memory (DBLSTM) structure. However, its enhanced output becomes muffled due to the spectral over-smoothing effect. To overcome this problem, we propose a new structure of DBLSTM-based speech enhancement system that internally incorporates the speech and noise power estimation processes in the spectral filtering framework. Furthermore, we propose a structure with an additional internal constraint of minimizing log a priori SNR, which provides efficient learning mechanism. Experimental results show that the proposed method improves source-to-distortion ratio (SDR) by 6.9 dB and perceptual evaluation of speech quality (PESQ) by 0.24 in comparison to the conventional DBLSTM-based system.

AB - In this paper, we propose a new recurrent neural network (RNN)-based single-channel speech enhancement framework for off-line wind noise reduction. To adequately represent highly non-stationary characteristics of wind noise, we first adopt a deep bi-directional long short-term memory (DBLSTM) structure. However, its enhanced output becomes muffled due to the spectral over-smoothing effect. To overcome this problem, we propose a new structure of DBLSTM-based speech enhancement system that internally incorporates the speech and noise power estimation processes in the spectral filtering framework. Furthermore, we propose a structure with an additional internal constraint of minimizing log a priori SNR, which provides efficient learning mechanism. Experimental results show that the proposed method improves source-to-distortion ratio (SDR) by 6.9 dB and perceptual evaluation of speech quality (PESQ) by 0.24 in comparison to the conventional DBLSTM-based system.

UR - http://www.scopus.com/inward/record.url?scp=85018737966&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018737966&partnerID=8YFLogxK

U2 - 10.1109/HSCMA.2017.7895558

DO - 10.1109/HSCMA.2017.7895558

M3 - Conference contribution

AN - SCOPUS:85018737966

T3 - 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings

SP - 41

EP - 45

BT - 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Lee J, Kim K, Shabestary T, Kang H-G. Deep bi-directional long short-term memory based speech enhancement for wind noise reduction. In 2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 41-45. 7895558. (2017 Hands-Free Speech Communications and Microphone Arrays, HSCMA 2017 - Proceedings). https://doi.org/10.1109/HSCMA.2017.7895558