Video Instance Segmentation using Inter-Frame Communication Transformers

Sukjun Hwang, Miran Heo, Seoung Wug Oh, Seon Joo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)


We propose a novel end-to-end solution for video instance segmentation (VIS) based on transformers. Recently, the per-clip pipeline shows superior performance over per-frame methods leveraging richer information from multiple frames. However, previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications, limiting practicality. In this work, we propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames by efficiently encoding the context within the input clip. Specifically, we propose to utilize concise memory tokens as a means of conveying information as well as summarizing each frame scene. The features of each frame are enriched and correlated with other frames through exchange of information between the precisely encoded memory tokens. We validate our method on the latest benchmark sets and achieved state-of-the-art performance (AP 42.6 on YouTube-VIS 2019 val set using the offline inference) while having a considerably fast runtime (89.4 FPS). Our method can also be applied to near-online inference for processing a video in real-time with only a small delay. The code is available at

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
PublisherNeural information processing systems foundation
Number of pages12
ISBN (Electronic)9781713845393
Publication statusPublished - 2021
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Duration: 2021 Dec 62021 Dec 14

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258


Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
CityVirtual, Online

Bibliographical note

Funding Information:
This research was grant funded by the Artificial Intelligence Graduate School Program of Yonsei University, under Grant 2020-0-01361, Korea Evaluation Institute of Industrial Technology (KEIT) funded by the Ministry of Trade, Industry and Energy (10073129), and also supported by the Advanced Robotics Laboratory, part of the Future Technology Center at LG Electronics.

Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of 'Video Instance Segmentation using Inter-Frame Communication Transformers'. Together they form a unique fingerprint.

Cite this