We present a spatiotemporal attention based multimodal deep neural networks for dimensional emotion recognition in multimodal audio-visual video sequence. To learn the temporal attention that discriminatively focuses on emotional sailient parts within speech audios, we formulate the temporal attention network using deep neural networks (DNNs). In addition, to learn the spatiotemporal attention that selectively focuses on emotional sailient parts within facial videos, the spatiotemporal encoder-decoder network is formulated using Convolutional LSTM (ConvLSTM) modules, and learned implicitly without any pixel-level annotations. By leveraging the spatiotemporal attention, the 3D convolutional neural networks (3D-CNNs) is also formulated to robustly recognize the dimensional emotion in facial videos. Furthermore, to exploit multimodal information, we fuse the audio and video features to emotion regression model. The experimental results show that our method can achieve the state-of-the-art results in dimensional emotion recognition with the highest concordance correlation coefficient (CCC) on AV+EC 2017 dataset.
|Title of host publication||AVSU 2018 - Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, Co-located with MM 2018|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||6|
|Publication status||Published - 2018 Oct 26|
|Event||2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, AVSU 2018, co-located with MM 2018 - Seoul, Korea, Republic of|
Duration: 2018 Oct 26 → …
|Name||AVSU 2018 - Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, Co-located with MM 2018|
|Conference||2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, AVSU 2018, co-located with MM 2018|
|Country/Territory||Korea, Republic of|
|Period||18/10/26 → …|
Bibliographical noteFunding Information:
This research was supported by the International Research Development Program of the National Research Foundation of Korea (NRF) funded by Ministry of Science and ICT (NRF-2017K1A3A1A16066838).
© 2018 Association for Computing Machinery.
All Science Journal Classification (ASJC) codes
- Media Technology
- Computer Graphics and Computer-Aided Design
- Computer Vision and Pattern Recognition