Self-Supervised Audio Spatialization with Correspondence Classifier

Yu Ding Lu, Hsin Ying Lee, Hung Yu Tseng, Ming Hsuan Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)


Spatial audio is an essential medium to audiences for 3D visual and auditory experience. However, the recording devices and techniques are expensive or inaccessible to the general public. In this work, we propose a self-supervised audio spatialization network that can generate spatial audio given the corresponding video and monaural audio. To enhance spatialization performance, we use an auxiliary classifier to classify ground-truth videos and those with audio where the left and right channels are swapped. We collect a large-scale video dataset with spatial audio to validate the proposed method. Experimental results demonstrate the effectiveness of the proposed model on the audio spatialization task.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Image Processing, ICIP 2019 - Proceedings
PublisherIEEE Computer Society
Number of pages5
ISBN (Electronic)9781538662496
Publication statusPublished - 2019 Sept
Event26th IEEE International Conference on Image Processing, ICIP 2019 - Taipei, Taiwan, Province of China
Duration: 2019 Sept 222019 Sept 25

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880


Conference26th IEEE International Conference on Image Processing, ICIP 2019
Country/TerritoryTaiwan, Province of China

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing


Dive into the research topics of 'Self-Supervised Audio Spatialization with Correspondence Classifier'. Together they form a unique fingerprint.

Cite this