Predicting multiple future frames from a given video is a challenging problem due to several factors such as changing camera, dynamically moving objects, occlusions, etc. While recent deep learning methods have made significant progress on the video prediction problem, most methods predict the immediate or a fixed number of future frames. To obtain longer-term frame predictions, existing techniques usually process the predicted frames iteratively, resulting in blurry or inconsistent predictions. In this work, we present a new approach that can predict an arbitrary number of future video frames with a single forward pass through the network. Instead of directly predicting a fixed number of future optical flows or frames, we learn temporal motion encodings, i.e., temporal motion basis vectors and a network to predict the coefficients. The learned motion basis can be easily extended to arbitrary length at inference time, enabling us to predict an arbitrary number of future frames. Experiments on benchmark datasets show that our approach performs favorably against several competitive techniques even for the next frame prediction setting. When evaluated under 5-frame or 10-frame prediction settings, the proposed method achieves higher performance gains over the state-of-the-art techniques that iteratively process the predictions.
|Title of host publication||2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings|
|Publisher||IEEE Computer Society|
|Number of pages||5|
|Publication status||Published - 2022|
|Event||29th IEEE International Conference on Image Processing, ICIP 2022 - Bordeaux, France|
Duration: 2022 Oct 16 → 2022 Oct 19
|Name||Proceedings - International Conference on Image Processing, ICIP|
|Conference||29th IEEE International Conference on Image Processing, ICIP 2022|
|Period||22/10/16 → 22/10/19|
Bibliographical notePublisher Copyright:
© 2022 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Signal Processing