Eidetic 3D LSTM: A model for video prediction and beyond

Yunbo Wang, Lu Jiang, Ming Hsuan Yang, Li Jia Li, Mingsheng Long, Li Fei-Fei

Research output: Contribution to conferencePaper

Abstract

Spatiotemporal predictive learning, though long considered to be a promising self-supervised feature learning method, seldom shows its effectiveness beyond future video prediction. The reason is that it is difficult to learn good representations for both short-term frame dependency and long-term high-level relations. We present a new model, Eidetic 3D LSTM (E3D-LSTM), that integrates 3D convolutions into RNNs. The encapsulated 3D-Conv makes local perceptrons of RNNs motion-aware and enables the memory cell to store better short-term features. For long-term relations, we make the present memory state interact with its historical records via a gate-controlled self-attention module. We describe this memory transition mechanism eidetic as it is able to effectively recall the stored memories across multiple time stamps even after long periods of disturbance. We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance. Then we show that the E3D-LSTM network also performs well on the early activity recognition to infer what is happening or what will happen after observing only limited frames of video. This task aligns well with video prediction in modeling action intentions and tendency.

Original languageEnglish
Publication statusPublished - 2019 Jan 1
Event7th International Conference on Learning Representations, ICLR 2019 - New Orleans, United States
Duration: 2019 May 62019 May 9

Conference

Conference7th International Conference on Learning Representations, ICLR 2019
CountryUnited States
CityNew Orleans
Period19/5/619/5/9

All Science Journal Classification (ASJC) codes

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Eidetic 3D LSTM: A model for video prediction and beyond'. Together they form a unique fingerprint.

Cite this