In this paper, we propose the Multitask Transformer Network for multitasking on untrimmed video. To analyze the untrimmed video, it needs to capture important frame and region in the spatio-temporal domain. Therefore, we utilize the Transformer Network, which can capture the useful features from CNN representations through an attention mechanism. Motivated by the Action Transformer Network, which is a repurposed model of the Transformer for video, we modified the concept of query which was specialized only for action recognition on the trimmed video to fit the untrimmed video. In addition, we modified the structure of the Transformer unit to the pre-activation structure for identity mapping on residual connections. We also utilize the class conversion matrix (CCM), one of the feature fusion methods, to share the information of different tasks. Combining our Transformer structure and CCM, the Multitask Transformer Network is proposed for multitasking on untrimmed video. Eventually, our model evaluated on CoVieW 2019, and we enhanced the performance through post-processing based on prediction results that suitable to the CoVieW 2019 evaluation metric. In CoVieW 2019 challenge, we placed fourth on final rank while first on scene and action score.
|Title of host publication||Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||9|
|Publication status||Published - 2019 Oct|
|Event||17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of|
Duration: 2019 Oct 27 → 2019 Oct 28
|Name||Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019|
|Conference||17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019|
|Country||Korea, Republic of|
|Period||19/10/27 → 19/10/28|
Bibliographical noteFunding Information:
This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7069370).
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Computer Vision and Pattern Recognition