Neural speech enhancement systems have seen dramatic improvements in performance recently. However, it is still difficult to create systems that can operate in real-time, with low delay, low complexity, and causality. In this paper, we propose a temporal and channel attention framework for a U-Net-based speech enhancement architecture that uses short analysis frame lengths. Specifically, we propose an attention-based temporal refinement network (TRN) that estimates convolutional features subject to the importance of temporal location. By adding the TRN output to the channel-attentive convolution output, we can further enhance speech-related features even in low-attentive channel outputs. To further improve the representation power of the convolutional features, we also apply a squeeze-and-excitation (SE)-based channel attention mechanism for three different network modules: main convolutional blocks after processing the TRN, skip connections, and residual connections in the bottleneck recurrent neural network (RNN) layer. In particular, a channel-wise gate architecture placed on the skip connections and residual connections reliably controls the data flow, which avoids transferring redundant information to the following stages. We show the effectiveness of the proposed TRN and channel-wise gating methods by visualizing the spectral characteristics of the corresponding features, evaluating overall enhancement performance, and performing ablation studies in various configurations. Our proposed real-time enhancement system outperforms several recent neural enhancement models in terms of quality, model size, and complexity.
|Journal||Digital Signal Processing: A Review Journal|
|Publication status||Published - 2023 Mar|
Bibliographical noteFunding Information:
This project was supported by the Institute for Information & Communications Technology Planning & Evaluation ( IITP ) grant funded by the Korea government (No. 2019-0-01558 : Study on audio, video, 3d map and activation map generation system using deep generative model) and the Yonsei Signature Research Cluster Program of 2022 ( 2022-22-0002 ).
© 2022 Elsevier Inc.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Computer Vision and Pattern Recognition
- Statistics, Probability and Uncertainty
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics
- Electrical and Electronic Engineering