Stacked U-Net with High-Level Feature Transfer for Parameter Efficient Speech Enhancement

Jinyoung Lee, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we present a stacked U-Net structure-based speech enhancement algorithm with parameter reduction and real-time processing. To significantly reduce the number of network parameters, we propose a stacked structure in which several shallow U-Nets with fewer convolutional layer channels are cascaded. However, simply stacking the small-scale U-Nets cannot sufficiently compensate for the performance loss caused by the lack of parameters. To overcome this problem, we propose a high-level feature transfer method that passes all the multi-channel output features, which are obtained before passing through the intermediate output layer, to the next stage. Furthermore, our proposed model can process analysis frames with short lengths because its downsampling and upsampling blocks are much smaller than the conventional Wave U-Net method; theses smaller layers make our proposed model suitable for low-delay online processing. Experiments show that our proposed method outperforms the conventional Wave U-Net method on almost all objective measures and requires only 7.21% of the network parameters when compared to the conventional method. In addition, our model can be successfully implemented in real time on both GPU and CPU environments.

Original languageEnglish
Title of host publication2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages591-595
Number of pages5
ISBN (Electronic)9789881476890
Publication statusPublished - 2021
Event2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
Duration: 2021 Dec 142021 Dec 17

Publication series

Name2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Country/TerritoryJapan
CityTokyo
Period21/12/1421/12/17

Bibliographical note

Funding Information:
This project was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (No. 2019-0-01558).

Funding Information:
This project was supported by the Institute for Information & Communications TechnologyPlanning & Evaluation(IITP) grant funded by the Korea government (No. 2019-0-01558: Study on audio, video, 3d map and activation map generation system using deep generative model)

Publisher Copyright:
© 2021 APSIPA.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Instrumentation

Fingerprint

Dive into the research topics of 'Stacked U-Net with High-Level Feature Transfer for Parameter Efficient Speech Enhancement'. Together they form a unique fingerprint.

Cite this