Video multitask transformer network

Hongje Seong, Junhyuk Hyun, Euntai Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose the Multitask Transformer Network for multitasking on untrimmed video. To analyze the untrimmed video, it needs to capture important frame and region in the spatio-temporal domain. Therefore, we utilize the Transformer Network, which can capture the useful features from CNN representations through an attention mechanism. Motivated by the Action Transformer Network, which is a repurposed model of the Transformer for video, we modified the concept of query which was specialized only for action recognition on the trimmed video to fit the untrimmed video. In addition, we modified the structure of the Transformer unit to the pre-activation structure for identity mapping on residual connections. We also utilize the class conversion matrix (CCM), one of the feature fusion methods, to share the information of different tasks. Combining our Transformer structure and CCM, the Multitask Transformer Network is proposed for multitasking on untrimmed video. Eventually, our model evaluated on CoVieW 2019, and we enhanced the performance through post-processing based on prediction results that suitable to the CoVieW 2019 evaluation metric. In CoVieW 2019 challenge, we placed fourth on final rank while first on scene and action score.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1553-1561
Number of pages9
ISBN (Electronic)9781728150239
DOIs
Publication statusPublished - 2019 Oct
Event17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of
Duration: 2019 Oct 272019 Oct 28

Publication series

NameProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019

Conference

Conference17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
CountryKorea, Republic of
CitySeoul
Period19/10/2719/10/28

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Video multitask transformer network'. Together they form a unique fingerprint.

  • Cite this

    Seong, H., Hyun, J., & Kim, E. (2019). Video multitask transformer network. In Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 (pp. 1553-1561). [9022029] (Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCVW.2019.00194