Gaze detection by estimating the depths and 3D motion of facial features in monocular images

Kang Ryoung Park, Si Wook Nam, Min Suk Lee, Jaihie Kim

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

This paper describes a new method for detecting the gaze position of a user on a monitor from monocular images. In order to detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images and estimate the 3D depth information and the initial 3D positions of those features by recursive estimation algorithm in starting images. Then, when a user moves his/her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be estimated from 3D motion estimation by Extended Kalman Filter (EKF) and affine transform. Finally, the gaze position on a monitor is calculated from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D depth and positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D depth and the position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Extended Kalman Filter (EKF) are about 3.6 degrees and 1.4 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.1 inches of RMS error.

Original languageEnglish
Pages (from-to)2274-2284
Number of pages11
JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
VolumeE82-A
Issue number10
Publication statusPublished - 1999 Jan 1

Fingerprint

Extended Kalman filters
Motion estimation
Motion
Cameras
Affine transforms
Error analysis
Monitor
Feature Point
RMS Errors
Sensors
Motion Estimation
Estimation Error
Kalman Filter
Camera
Recursive Estimation
Normal vector
Recursive Algorithm
Estimation Algorithms
Face
Transform

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering
  • Applied Mathematics

Cite this

@article{d4f84e0a510146f28d98e9f1e14aef3c,
title = "Gaze detection by estimating the depths and 3D motion of facial features in monocular images",
abstract = "This paper describes a new method for detecting the gaze position of a user on a monitor from monocular images. In order to detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images and estimate the 3D depth information and the initial 3D positions of those features by recursive estimation algorithm in starting images. Then, when a user moves his/her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be estimated from 3D motion estimation by Extended Kalman Filter (EKF) and affine transform. Finally, the gaze position on a monitor is calculated from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D depth and positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D depth and the position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Extended Kalman Filter (EKF) are about 3.6 degrees and 1.4 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.1 inches of RMS error.",
author = "Park, {Kang Ryoung} and Nam, {Si Wook} and Lee, {Min Suk} and Jaihie Kim",
year = "1999",
month = "1",
day = "1",
language = "English",
volume = "E82-A",
pages = "2274--2284",
journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
issn = "0916-8508",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "10",

}

Gaze detection by estimating the depths and 3D motion of facial features in monocular images. / Park, Kang Ryoung; Nam, Si Wook; Lee, Min Suk; Kim, Jaihie.

In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E82-A, No. 10, 01.01.1999, p. 2274-2284.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Gaze detection by estimating the depths and 3D motion of facial features in monocular images

AU - Park, Kang Ryoung

AU - Nam, Si Wook

AU - Lee, Min Suk

AU - Kim, Jaihie

PY - 1999/1/1

Y1 - 1999/1/1

N2 - This paper describes a new method for detecting the gaze position of a user on a monitor from monocular images. In order to detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images and estimate the 3D depth information and the initial 3D positions of those features by recursive estimation algorithm in starting images. Then, when a user moves his/her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be estimated from 3D motion estimation by Extended Kalman Filter (EKF) and affine transform. Finally, the gaze position on a monitor is calculated from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D depth and positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D depth and the position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Extended Kalman Filter (EKF) are about 3.6 degrees and 1.4 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.1 inches of RMS error.

AB - This paper describes a new method for detecting the gaze position of a user on a monitor from monocular images. In order to detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images and estimate the 3D depth information and the initial 3D positions of those features by recursive estimation algorithm in starting images. Then, when a user moves his/her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be estimated from 3D motion estimation by Extended Kalman Filter (EKF) and affine transform. Finally, the gaze position on a monitor is calculated from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D depth and positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D depth and the position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Extended Kalman Filter (EKF) are about 3.6 degrees and 1.4 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.1 inches of RMS error.

UR - http://www.scopus.com/inward/record.url?scp=9444224033&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=9444224033&partnerID=8YFLogxK

M3 - Article

VL - E82-A

SP - 2274

EP - 2284

JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

SN - 0916-8508

IS - 10

ER -