Two-level bimodal association for audio-visual speech recognition

Jong Seok Lee, Touradj Ebrahimi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

Original languageEnglish
Title of host publicationAdvanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings
Pages133-144
Number of pages12
DOIs
Publication statusPublished - 2009 Dec 1
Event11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009 - Bordeaux, France
Duration: 2009 Sep 282009 Oct 2

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5807 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009
CountryFrance
CityBordeaux
Period09/9/2809/10/2

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Two-level bimodal association for audio-visual speech recognition'. Together they form a unique fingerprint.

  • Cite this

    Lee, J. S., & Ebrahimi, T. (2009). Two-level bimodal association for audio-visual speech recognition. In Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings (pp. 133-144). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5807 LNCS). https://doi.org/10.1007/978-3-642-04697-1_13