From Human Pose Similarity Metric to 3D Human Pose Estimator: Temporal Propagating LSTM Networks

Kyoungoh Lee, Woojae Kim, Sanghoon Lee

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Predicting a 3D pose directly from a monocular image is a challenging problem. Most pose estimation methods proposed in recent years have shown 'quantitatively' good results (below $\sim$∼50mm). However, these methods remain 'perceptually' flawed because their performance is only measured via a simple distance metric. Although this fact is well understood, the reliance on 'quantitative' information implies that the development of 3D pose estimation methods has been slowed down. To address this issue, we first propose a perceptual Pose SIMilarity (PSIM) metric, by assuming that human perception (HP) is highly adapted to extracting structural information from a given signal. Second, we present a perceptually robust 3D pose estimation framework: Temporal Propagating Long Short-Term Memory networks (TP-LSTMs). Toward this, we analyze the information-theory-based spatio-temporal posture correlations, including joint interdependency, temporal consistency, and HP. The experimental results clearly show that the proposed PSIM metric achieves a superior correlation with users' subjective opinions than conventional pose metrics. Furthermore, we demonstrate the significant quantitative and perceptual performance improvements of TP-LSTMs compared to existing state-of-the-art methods.

Original languageEnglish
Pages (from-to)1781-1797
Number of pages17
JournalIEEE transactions on pattern analysis and machine intelligence
Issue number2
Publication statusPublished - 2023 Feb 1

Bibliographical note

Publisher Copyright:
© 1979-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics


Dive into the research topics of 'From Human Pose Similarity Metric to 3D Human Pose Estimator: Temporal Propagating LSTM Networks'. Together they form a unique fingerprint.

Cite this