Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement

Haemin Yang, Soyeon Choe, Keulbit Kim, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In single-channel speech enhancement, it is essential to determine noise reduction factors to successfully remove noise while minimizing speech distortion. These factors are typically set by a function of noise power spectral density (PSD) in time-frequency domain, and the state-of-the-art algorithm also introduces additional processes to estimate speech presence probability (SPP) to further enhance the estimation. Due to many tuning parameters, however, it is not easy to implement an algorithm that reliably estimates SPP in noise varying environment. We proposed a combination of deep learning network and an effective training method to enhance the performance of the SPP estimation module. The proposed approach is regarded as a hybrid approach, with the noise reduction factor still estimated by the conventional statistic-based single channel enhancement algorithms. The advantages and disadvantages of the proposed approach compared to deep learning approach of single channel speech enhancement are also investigated.

Original languageEnglish
Title of host publication2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages267-270
Number of pages4
ISBN (Electronic)9781538656891
DOIs
Publication statusPublished - 2018 Jun 4
Event2nd International Conference on Signals and Systems, ICSigSys 2018 - Bali, Indonesia
Duration: 2018 May 12018 May 3

Publication series

Name2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings

Conference

Conference2nd International Conference on Signals and Systems, ICSigSys 2018
CountryIndonesia
CityBali
Period18/5/118/5/3

Fingerprint

Speech enhancement
Power spectral density
learning
Noise
Learning
augmentation
Noise abatement
noise reduction
Acoustic noise
Tuning
Statistics
estimates
Deep learning
Power (Psychology)
education
modules
tuning
statistics

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Radiology Nuclear Medicine and imaging
  • Instrumentation

Cite this

Yang, H., Choe, S., Kim, K., & Kang, H. G. (2018). Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement. In 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings (pp. 267-270). (2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSIGSYS.2018.8372770
Yang, Haemin ; Choe, Soyeon ; Kim, Keulbit ; Kang, Hong Goo. / Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement. 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 267-270 (2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings).
@inproceedings{8aed3fa515cf4eb8a5ba25a3ec6f7f39,
title = "Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement",
abstract = "In single-channel speech enhancement, it is essential to determine noise reduction factors to successfully remove noise while minimizing speech distortion. These factors are typically set by a function of noise power spectral density (PSD) in time-frequency domain, and the state-of-the-art algorithm also introduces additional processes to estimate speech presence probability (SPP) to further enhance the estimation. Due to many tuning parameters, however, it is not easy to implement an algorithm that reliably estimates SPP in noise varying environment. We proposed a combination of deep learning network and an effective training method to enhance the performance of the SPP estimation module. The proposed approach is regarded as a hybrid approach, with the noise reduction factor still estimated by the conventional statistic-based single channel enhancement algorithms. The advantages and disadvantages of the proposed approach compared to deep learning approach of single channel speech enhancement are also investigated.",
author = "Haemin Yang and Soyeon Choe and Keulbit Kim and Kang, {Hong Goo}",
year = "2018",
month = "6",
day = "4",
doi = "10.1109/ICSIGSYS.2018.8372770",
language = "English",
series = "2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "267--270",
booktitle = "2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings",
address = "United States",

}

Yang, H, Choe, S, Kim, K & Kang, HG 2018, Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement. in 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings. 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 267-270, 2nd International Conference on Signals and Systems, ICSigSys 2018, Bali, Indonesia, 18/5/1. https://doi.org/10.1109/ICSIGSYS.2018.8372770

Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement. / Yang, Haemin; Choe, Soyeon; Kim, Keulbit; Kang, Hong Goo.

2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 267-270 (2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement

AU - Yang, Haemin

AU - Choe, Soyeon

AU - Kim, Keulbit

AU - Kang, Hong Goo

PY - 2018/6/4

Y1 - 2018/6/4

N2 - In single-channel speech enhancement, it is essential to determine noise reduction factors to successfully remove noise while minimizing speech distortion. These factors are typically set by a function of noise power spectral density (PSD) in time-frequency domain, and the state-of-the-art algorithm also introduces additional processes to estimate speech presence probability (SPP) to further enhance the estimation. Due to many tuning parameters, however, it is not easy to implement an algorithm that reliably estimates SPP in noise varying environment. We proposed a combination of deep learning network and an effective training method to enhance the performance of the SPP estimation module. The proposed approach is regarded as a hybrid approach, with the noise reduction factor still estimated by the conventional statistic-based single channel enhancement algorithms. The advantages and disadvantages of the proposed approach compared to deep learning approach of single channel speech enhancement are also investigated.

AB - In single-channel speech enhancement, it is essential to determine noise reduction factors to successfully remove noise while minimizing speech distortion. These factors are typically set by a function of noise power spectral density (PSD) in time-frequency domain, and the state-of-the-art algorithm also introduces additional processes to estimate speech presence probability (SPP) to further enhance the estimation. Due to many tuning parameters, however, it is not easy to implement an algorithm that reliably estimates SPP in noise varying environment. We proposed a combination of deep learning network and an effective training method to enhance the performance of the SPP estimation module. The proposed approach is regarded as a hybrid approach, with the noise reduction factor still estimated by the conventional statistic-based single channel enhancement algorithms. The advantages and disadvantages of the proposed approach compared to deep learning approach of single channel speech enhancement are also investigated.

UR - http://www.scopus.com/inward/record.url?scp=85049327273&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049327273&partnerID=8YFLogxK

U2 - 10.1109/ICSIGSYS.2018.8372770

DO - 10.1109/ICSIGSYS.2018.8372770

M3 - Conference contribution

AN - SCOPUS:85049327273

T3 - 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings

SP - 267

EP - 270

BT - 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Yang H, Choe S, Kim K, Kang HG. Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement. In 2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 267-270. (2018 International Conference on Signals and Systems, ICSigSys 2018 - Proceedings). https://doi.org/10.1109/ICSIGSYS.2018.8372770