A cross-channel attention-based Wave-U-Net for multi-channel speech enhancement

Minh Tri Ho, Jinyoung Lee, Bong Ki Lee, Dong Hoon Yi, Hong Goo Kang

Research output: Contribution to journalConference articlepeer-review

3 Citations (Scopus)

Abstract

In this paper, we present a novel architecture for multichannel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as well as spectral information, it is challenging to effectively train a multi-channel deep learning system in an end-to-end framework. With a channel-independent encoding architecture for spectral estimation and a strategy to extract spatial information through an inter-channel attention mechanism, we implement a multi-channel speech enhancement system that has high performance even in reverberant and extremely noisy environments. Experimental results show that the proposed architecture has superior performance in terms of signal-to-distortion ratio improvement (SDRi), short-time objective intelligence (STOI), and phoneme error rate (PER) for speech recognition.

Original languageEnglish
Pages (from-to)4049-4053
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
DOIs
Publication statusPublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 2020 Oct 252020 Oct 29

Bibliographical note

Funding Information:
This work is supported and funded by LG Electronics Co., Ltd.

Publisher Copyright:
Copyright © 2020 ISCA

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'A cross-channel attention-based Wave-U-Net for multi-channel speech enhancement'. Together they form a unique fingerprint.

Cite this