Abstract
In this paper, we present a stacked U-Net structure-based speech enhancement algorithm with parameter reduction and real-time processing. To significantly reduce the number of network parameters, we propose a stacked structure in which several shallow U-Nets with fewer convolutional layer channels are cascaded. However, simply stacking the small-scale U-Nets cannot sufficiently compensate for the performance loss caused by the lack of parameters. To overcome this problem, we propose a high-level feature transfer method that passes all the multi-channel output features, which are obtained before passing through the intermediate output layer, to the next stage. Furthermore, our proposed model can process analysis frames with short lengths because its downsampling and upsampling blocks are much smaller than the conventional Wave U-Net method; theses smaller layers make our proposed model suitable for low-delay online processing. Experiments show that our proposed method outperforms the conventional Wave U-Net method on almost all objective measures and requires only 7.21% of the network parameters when compared to the conventional method. In addition, our model can be successfully implemented in real time on both GPU and CPU environments.
Original language | English |
---|---|
Title of host publication | 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 591-595 |
Number of pages | 5 |
ISBN (Electronic) | 9789881476890 |
Publication status | Published - 2021 |
Event | 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan Duration: 2021 Dec 14 → 2021 Dec 17 |
Publication series
Name | 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings |
---|
Conference
Conference | 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 |
---|---|
Country/Territory | Japan |
City | Tokyo |
Period | 21/12/14 → 21/12/17 |
Bibliographical note
Funding Information:This project was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (No. 2019-0-01558).
Funding Information:
This project was supported by the Institute for Information & Communications TechnologyPlanning & Evaluation(IITP) grant funded by the Korea government (No. 2019-0-01558: Study on audio, video, 3d map and activation map generation system using deep generative model)
Publisher Copyright:
© 2021 APSIPA.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Vision and Pattern Recognition
- Signal Processing
- Instrumentation