Real-time neural speech enhancement based on temporal refinement network and channel-wise gating methods

Jinyoung Lee, Hong Goo Kang

Research output: Contribution to journalArticlepeer-review

Abstract

Neural speech enhancement systems have seen dramatic improvements in performance recently. However, it is still difficult to create systems that can operate in real-time, with low delay, low complexity, and causality. In this paper, we propose a temporal and channel attention framework for a U-Net-based speech enhancement architecture that uses short analysis frame lengths. Specifically, we propose an attention-based temporal refinement network (TRN) that estimates convolutional features subject to the importance of temporal location. By adding the TRN output to the channel-attentive convolution output, we can further enhance speech-related features even in low-attentive channel outputs. To further improve the representation power of the convolutional features, we also apply a squeeze-and-excitation (SE)-based channel attention mechanism for three different network modules: main convolutional blocks after processing the TRN, skip connections, and residual connections in the bottleneck recurrent neural network (RNN) layer. In particular, a channel-wise gate architecture placed on the skip connections and residual connections reliably controls the data flow, which avoids transferring redundant information to the following stages. We show the effectiveness of the proposed TRN and channel-wise gating methods by visualizing the spectral characteristics of the corresponding features, evaluating overall enhancement performance, and performing ablation studies in various configurations. Our proposed real-time enhancement system outperforms several recent neural enhancement models in terms of quality, model size, and complexity.

Original languageEnglish
Article number103879
JournalDigital Signal Processing: A Review Journal
Volume133
DOIs
Publication statusPublished - 2023 Mar

Bibliographical note

Funding Information:
This project was supported by the Institute for Information & Communications Technology Planning & Evaluation ( IITP ) grant funded by the Korea government (No. 2019-0-01558 : Study on audio, video, 3d map and activation map generation system using deep generative model) and the Yonsei Signature Research Cluster Program of 2022 ( 2022-22-0002 ).

Publisher Copyright:
© 2022 Elsevier Inc.

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Statistics, Probability and Uncertainty
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Real-time neural speech enhancement based on temporal refinement network and channel-wise gating methods'. Together they form a unique fingerprint.

Cite this