Multi-task joint learning for videos in the wild

Yong Won Hong, Hoseong Kim, Hyeran Byun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most of the conventional state-of-the-art methods for video analysis achieve outstanding performance by combining two or more different inputs, e.g. an RGB image, a motion image, or an audio signal, in a two-stream manner. Although these approaches generate pronounced performance, it underlines that each considered feature is tantamount in the classification of the video. This dilutes the nature of each class that every class depends on the different levels of information from different features. To incorporate the nature of each class, we present the class nature specific fusion that combines the features with a different level of weights for the optimal class result. In this work, we first represent each frame-level video feature as a spectral image to train convolutional neural networks (CNNs) on the RGB and audio features. We then revise the conventional two-stream fusion method to form a class nature specific one by combining features in different weight for different classes. We evaluate our method on the Comprehensive Video Understanding in the Wild dataset to understand how each class reacted on each feature in wild videos. Our experimental results not only show the advantage over conventional two-stream fusion, but also illustrate the correlation of two features: RGB and audio signal for each class.

Original languageEnglish
Title of host publicationCoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018
PublisherAssociation for Computing Machinery, Inc
Pages27-30
Number of pages4
ISBN (Electronic)9781450359764
DOIs
Publication statusPublished - 2018 Oct 15
Event1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, CoVieW 2018, in conjunction with ACM Multimedia, MM 2018 - Seoul, Korea, Republic of
Duration: 2018 Oct 22 → …

Publication series

NameCoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018

Other

Other1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, CoVieW 2018, in conjunction with ACM Multimedia, MM 2018
CountryKorea, Republic of
CitySeoul
Period18/10/22 → …

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Health Informatics
  • Media Technology

Fingerprint Dive into the research topics of 'Multi-task joint learning for videos in the wild'. Together they form a unique fingerprint.

  • Cite this

    Hong, Y. W., Kim, H., & Byun, H. (2018). Multi-task joint learning for videos in the wild. In CoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018 (pp. 27-30). (CoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018). Association for Computing Machinery, Inc. https://doi.org/10.1145/3265987.3265988