TY - GEN
T1 - Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks
AU - Lee, Inwoong
AU - Kim, Doyoung
AU - Kang, Seoungyoon
AU - Lee, Sanghoon
N1 - Publisher Copyright:
© 2017 IEEE.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2017/12/22
Y1 - 2017/12/22
N2 - This paper addresses the problems of feature representation of skeleton joints and the modeling of temporal dynamics to recognize human actions. Traditional methods generally use relative coordinate systems dependent on some joints, and model only the long-term dependency, while excluding short-term and medium term dependencies. Instead of taking raw skeletons as the input, we transform the skeletons into another coordinate system to obtain the robustness to scale, rotation and translation, and then extract salient motion features from them. Considering that Long Shortterm Memory (LSTM) networks with various time-step sizes can model various attributes well, we propose novel ensemble Temporal Sliding LSTM (TS-LSTM) networks for skeleton-based action recognition. The proposed network is composed of multiple parts containing short-term, mediumterm and long-term TS-LSTM networks, respectively. In our network, we utilize an average ensemble among multiple parts as a final feature to capture various temporal dependencies. We evaluate the proposed networks and the additional other architectures to verify the effectiveness of the proposed networks, and also compare them with several other methods on five challenging datasets. The experimental results demonstrate that our network models achieve the state-of-the-art performance through various temporal features. Additionally, we analyze a relation between the recognized actions and the multi-term TS-LSTM features by visualizing the softmax features of multiple parts.
AB - This paper addresses the problems of feature representation of skeleton joints and the modeling of temporal dynamics to recognize human actions. Traditional methods generally use relative coordinate systems dependent on some joints, and model only the long-term dependency, while excluding short-term and medium term dependencies. Instead of taking raw skeletons as the input, we transform the skeletons into another coordinate system to obtain the robustness to scale, rotation and translation, and then extract salient motion features from them. Considering that Long Shortterm Memory (LSTM) networks with various time-step sizes can model various attributes well, we propose novel ensemble Temporal Sliding LSTM (TS-LSTM) networks for skeleton-based action recognition. The proposed network is composed of multiple parts containing short-term, mediumterm and long-term TS-LSTM networks, respectively. In our network, we utilize an average ensemble among multiple parts as a final feature to capture various temporal dependencies. We evaluate the proposed networks and the additional other architectures to verify the effectiveness of the proposed networks, and also compare them with several other methods on five challenging datasets. The experimental results demonstrate that our network models achieve the state-of-the-art performance through various temporal features. Additionally, we analyze a relation between the recognized actions and the multi-term TS-LSTM features by visualizing the softmax features of multiple parts.
UR - http://www.scopus.com/inward/record.url?scp=85041927537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041927537&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2017.115
DO - 10.1109/ICCV.2017.115
M3 - Conference contribution
AN - SCOPUS:85041927537
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 1012
EP - 1020
BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017
Y2 - 22 October 2017 through 29 October 2017
ER -