EmbraceNet

A robust deep learning architecture for multimodal classification

Jun Ho Choi, Jong-Seok Lee

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available.

Original languageEnglish
Pages (from-to)259-270
Number of pages12
JournalInformation Fusion
Volume51
DOIs
Publication statusPublished - 2019 Nov 1

Fingerprint

Fusion reactions
Learning systems
Degradation
Deep learning

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems
  • Hardware and Architecture

Cite this

@article{e0b91931c73345c0ac6ad8e060955604,
title = "EmbraceNet: A robust deep learning architecture for multimodal classification",
abstract = "Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available.",
author = "Choi, {Jun Ho} and Jong-Seok Lee",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.inffus.2019.02.010",
language = "English",
volume = "51",
pages = "259--270",
journal = "Information Fusion",
issn = "1566-2535",
publisher = "Elsevier",

}

EmbraceNet : A robust deep learning architecture for multimodal classification. / Choi, Jun Ho; Lee, Jong-Seok.

In: Information Fusion, Vol. 51, 01.11.2019, p. 259-270.

Research output: Contribution to journalArticle

TY - JOUR

T1 - EmbraceNet

T2 - A robust deep learning architecture for multimodal classification

AU - Choi, Jun Ho

AU - Lee, Jong-Seok

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available.

AB - Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available.

UR - http://www.scopus.com/inward/record.url?scp=85062447851&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062447851&partnerID=8YFLogxK

U2 - 10.1016/j.inffus.2019.02.010

DO - 10.1016/j.inffus.2019.02.010

M3 - Article

VL - 51

SP - 259

EP - 270

JO - Information Fusion

JF - Information Fusion

SN - 1566-2535

ER -