Multi-Modal Recurrent Attention Networks for Facial Expression Recognition

Jiyoung Lee, Sunok Kim, Seungryong Kim, Kwanghoon Sohn

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Recent deep neural networks based methods have achieved state-of-the-art performance on various facial expression recognition tasks. Despite such progress, previous researches for facial expression recognition have mainly focused on analyzing color recording videos only. However, the complex emotions expressed by people with different skin colors under different lighting conditions through dynamic facial expressions can be fully understandable by integrating information from multi-modal videos. We present a novel method to estimate dimensional emotion states, where color, depth, and thermal recording videos are used as a multi-modal input. Our networks, called multi-modal recurrent attention networks (MRAN), learn spatiotemporal attention volumes to robustly recognize the facial expression based on attention-boosted feature volumes. We leverage the depth and thermal sequences as guidance priors for color sequence to selectively focus on emotional discriminative regions. We also introduce a novel benchmark for multi-modal facial expression recognition, termed as multi-modal arousal-valence facial expression recognition (MAVFER), which consists of color, depth, and thermal recording videos with corresponding continuous arousal-valence scores. The experimental results show that our method can achieve the state-of-the-art results in dimensional facial expression recognition on color recording datasets including RECOLA, SEWA and AFEW, and a multi-modal recording dataset including MAVFER.

Original languageEnglish
Article number9102419
Pages (from-to)6977-6991
Number of pages15
JournalIEEE Transactions on Image Processing
Volume29
DOIs
Publication statusPublished - 2020

Bibliographical note

Funding Information:
Manuscript received June 3, 2019; revised December 5, 2019 and April 4, 2020; accepted May 12, 2020. Date of publication May 27, 2020; date of current version July 8, 2020. This work was supported by the R&D program for Advanced Integrated-Intelligence for Identification (AIID) through the National Research Foundation of Korea (NRF) funded by Ministry of Science and ICT under Grant NRF-2018M3E3A1057289. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Julian Fierrez. (Corresponding author: Kwanghoon Sohn.) Jiyoung Lee, Sunok Kim, and Kwanghoon Sohn are with the School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, South Korea (e-mail: easy00@yonsei.ac.kr; kso428@ yonsei.ac.kr; khsohn@yonsei.ac.kr).

Publisher Copyright:
© 1992-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Multi-Modal Recurrent Attention Networks for Facial Expression Recognition'. Together they form a unique fingerprint.

Cite this