Weighing classes and streams

Toward better methods for two-stream convolutional networks

Hoseong Kim, Youngjung Uh, Seunghyeon Ko, Hyeran Byun

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The emergence of two-stream convolutional networks has boosted the performance of action recognition by concurrently extracting appearance and motion features from videos. However, most existing approaches simply combine the features by averaging the prediction scores from each recognition stream without realizing that some classes favor greater weight for appearance than motion. We propose a fusion method of two-stream convolutional networks for action recognition by introducing objective functions of weights with two assumptions: (1) the scores from streams do not weigh the same and (2) the weights vary across different classes. We evaluate our method by extensive experiments on UCF101, HMDB51, and Hollywood2 datasets in the context of action recognition. The results show that the proposed approach outperforms the standard two-stream convolutional networks by a large margin (5.7%, 4.8%, and 3.6%) on UCF101, HMDB51, and Hollywood2 datasets, respectively.

Original languageEnglish
Article number053108
JournalOptical Engineering
Volume55
Issue number5
DOIs
Publication statusPublished - 2016 May 1

Fingerprint

Weighing
Fusion reactions
Experiments
margins
fusion
predictions

All Science Journal Classification (ASJC) codes

  • Atomic and Molecular Physics, and Optics
  • Engineering(all)

Cite this

@article{77a29b6e298e4e1e820f7195fa2c6c05,
title = "Weighing classes and streams: Toward better methods for two-stream convolutional networks",
abstract = "The emergence of two-stream convolutional networks has boosted the performance of action recognition by concurrently extracting appearance and motion features from videos. However, most existing approaches simply combine the features by averaging the prediction scores from each recognition stream without realizing that some classes favor greater weight for appearance than motion. We propose a fusion method of two-stream convolutional networks for action recognition by introducing objective functions of weights with two assumptions: (1) the scores from streams do not weigh the same and (2) the weights vary across different classes. We evaluate our method by extensive experiments on UCF101, HMDB51, and Hollywood2 datasets in the context of action recognition. The results show that the proposed approach outperforms the standard two-stream convolutional networks by a large margin (5.7{\%}, 4.8{\%}, and 3.6{\%}) on UCF101, HMDB51, and Hollywood2 datasets, respectively.",
author = "Hoseong Kim and Youngjung Uh and Seunghyeon Ko and Hyeran Byun",
year = "2016",
month = "5",
day = "1",
doi = "10.1117/1.OE.55.5.053108",
language = "English",
volume = "55",
journal = "Optical Engineering",
issn = "0091-3286",
publisher = "SPIE",
number = "5",

}

Weighing classes and streams : Toward better methods for two-stream convolutional networks. / Kim, Hoseong; Uh, Youngjung; Ko, Seunghyeon; Byun, Hyeran.

In: Optical Engineering, Vol. 55, No. 5, 053108, 01.05.2016.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Weighing classes and streams

T2 - Toward better methods for two-stream convolutional networks

AU - Kim, Hoseong

AU - Uh, Youngjung

AU - Ko, Seunghyeon

AU - Byun, Hyeran

PY - 2016/5/1

Y1 - 2016/5/1

N2 - The emergence of two-stream convolutional networks has boosted the performance of action recognition by concurrently extracting appearance and motion features from videos. However, most existing approaches simply combine the features by averaging the prediction scores from each recognition stream without realizing that some classes favor greater weight for appearance than motion. We propose a fusion method of two-stream convolutional networks for action recognition by introducing objective functions of weights with two assumptions: (1) the scores from streams do not weigh the same and (2) the weights vary across different classes. We evaluate our method by extensive experiments on UCF101, HMDB51, and Hollywood2 datasets in the context of action recognition. The results show that the proposed approach outperforms the standard two-stream convolutional networks by a large margin (5.7%, 4.8%, and 3.6%) on UCF101, HMDB51, and Hollywood2 datasets, respectively.

AB - The emergence of two-stream convolutional networks has boosted the performance of action recognition by concurrently extracting appearance and motion features from videos. However, most existing approaches simply combine the features by averaging the prediction scores from each recognition stream without realizing that some classes favor greater weight for appearance than motion. We propose a fusion method of two-stream convolutional networks for action recognition by introducing objective functions of weights with two assumptions: (1) the scores from streams do not weigh the same and (2) the weights vary across different classes. We evaluate our method by extensive experiments on UCF101, HMDB51, and Hollywood2 datasets in the context of action recognition. The results show that the proposed approach outperforms the standard two-stream convolutional networks by a large margin (5.7%, 4.8%, and 3.6%) on UCF101, HMDB51, and Hollywood2 datasets, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84971524554&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971524554&partnerID=8YFLogxK

U2 - 10.1117/1.OE.55.5.053108

DO - 10.1117/1.OE.55.5.053108

M3 - Article

VL - 55

JO - Optical Engineering

JF - Optical Engineering

SN - 0091-3286

IS - 5

M1 - 053108

ER -