Contextual action cues from camera sensor for multi-stream action recognition

Jongkwang Hong, Bora Cho, Yong Won Hong, Hyeran Byun

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing “kicking” from “running”. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method.

Original languageEnglish
Article number1382
JournalSensors (Switzerland)
Volume19
Issue number6
DOIs
Publication statusPublished - 2019 Mar 2

Fingerprint

cues
Cues
Cameras
cameras
sensors
Sensors
Health Services Research
Recognition (Psychology)
balls

All Science Journal Classification (ASJC) codes

  • Analytical Chemistry
  • Atomic and Molecular Physics, and Optics
  • Biochemistry
  • Instrumentation
  • Electrical and Electronic Engineering

Cite this

Hong, Jongkwang ; Cho, Bora ; Hong, Yong Won ; Byun, Hyeran. / Contextual action cues from camera sensor for multi-stream action recognition. In: Sensors (Switzerland). 2019 ; Vol. 19, No. 6.
@article{785f388da3e142b79b1cd9df6746b8fb,
title = "Contextual action cues from camera sensor for multi-stream action recognition",
abstract = "In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing “kicking” from “running”. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method.",
author = "Jongkwang Hong and Bora Cho and Hong, {Yong Won} and Hyeran Byun",
year = "2019",
month = "3",
day = "2",
doi = "10.3390/s19061382",
language = "English",
volume = "19",
journal = "Sensors",
issn = "1424-3210",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "6",

}

Contextual action cues from camera sensor for multi-stream action recognition. / Hong, Jongkwang; Cho, Bora; Hong, Yong Won; Byun, Hyeran.

In: Sensors (Switzerland), Vol. 19, No. 6, 1382, 02.03.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Contextual action cues from camera sensor for multi-stream action recognition

AU - Hong, Jongkwang

AU - Cho, Bora

AU - Hong, Yong Won

AU - Byun, Hyeran

PY - 2019/3/2

Y1 - 2019/3/2

N2 - In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing “kicking” from “running”. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method.

AB - In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing “kicking” from “running”. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method.

UR - http://www.scopus.com/inward/record.url?scp=85063693908&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063693908&partnerID=8YFLogxK

U2 - 10.3390/s19061382

DO - 10.3390/s19061382

M3 - Article

C2 - 30897792

AN - SCOPUS:85063693908

VL - 19

JO - Sensors

JF - Sensors

SN - 1424-3210

IS - 6

M1 - 1382

ER -