Vision based speaker location detection

Jaehyun Lim, Jonggeun Park, Chulhee Lee

Research output: Contribution to journalConference article

Abstract

Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to video conference. In the proposed method, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.

Original languageEnglish
Article number102
Pages (from-to)904-911
Number of pages8
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume5685
Issue numberPART 2
DOIs
Publication statusPublished - 2005 Jul 21

Fingerprint

Acoustics
Video conferencing
rooms
Color
video conferencing
acoustics
Experiments
Interference
Vision
interference
color
Experiment

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Cite this

@article{5b49f6dabfcc4fdf86241805c863504e,
title = "Vision based speaker location detection",
abstract = "Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to video conference. In the proposed method, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.",
author = "Jaehyun Lim and Jonggeun Park and Chulhee Lee",
year = "2005",
month = "7",
day = "21",
doi = "10.1117/12.587326",
language = "English",
volume = "5685",
pages = "904--911",
journal = "Proceedings of SPIE - The International Society for Optical Engineering",
issn = "0277-786X",
publisher = "SPIE",
number = "PART 2",

}

Vision based speaker location detection. / Lim, Jaehyun; Park, Jonggeun; Lee, Chulhee.

In: Proceedings of SPIE - The International Society for Optical Engineering, Vol. 5685, No. PART 2, 102, 21.07.2005, p. 904-911.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Vision based speaker location detection

AU - Lim, Jaehyun

AU - Park, Jonggeun

AU - Lee, Chulhee

PY - 2005/7/21

Y1 - 2005/7/21

N2 - Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to video conference. In the proposed method, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.

AB - Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to video conference. In the proposed method, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.

UR - http://www.scopus.com/inward/record.url?scp=21844469273&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21844469273&partnerID=8YFLogxK

U2 - 10.1117/12.587326

DO - 10.1117/12.587326

M3 - Conference article

VL - 5685

SP - 904

EP - 911

JO - Proceedings of SPIE - The International Society for Optical Engineering

JF - Proceedings of SPIE - The International Society for Optical Engineering

SN - 0277-786X

IS - PART 2

M1 - 102

ER -