Two-level bimodal association for audio-visual speech recognition

Jong Seok Lee, Touradj Ebrahimi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

Original languageEnglish
Title of host publicationAdvanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings
Pages133-144
Number of pages12
DOIs
Publication statusPublished - 2009 Dec 1
Event11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009 - Bordeaux, France
Duration: 2009 Sep 282009 Oct 2

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5807 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009
CountryFrance
CityBordeaux
Period09/9/2809/10/2

Fingerprint

Bimodal
Speech Recognition
Speech recognition
Information fusion
Canonical Correlation Analysis
Synchronization
Information Fusion
Acoustics
Data Streams
Fusion
Experimental Results
Demonstrate
Vision
Speech

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Lee, J. S., & Ebrahimi, T. (2009). Two-level bimodal association for audio-visual speech recognition. In Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings (pp. 133-144). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5807 LNCS). https://doi.org/10.1007/978-3-642-04697-1_13
Lee, Jong Seok ; Ebrahimi, Touradj. / Two-level bimodal association for audio-visual speech recognition. Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings. 2009. pp. 133-144 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{4df8724a11424863a22c9c708a2acdf7,
title = "Two-level bimodal association for audio-visual speech recognition",
abstract = "This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.",
author = "Lee, {Jong Seok} and Touradj Ebrahimi",
year = "2009",
month = "12",
day = "1",
doi = "10.1007/978-3-642-04697-1_13",
language = "English",
isbn = "3642046967",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "133--144",
booktitle = "Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings",

}

Lee, JS & Ebrahimi, T 2009, Two-level bimodal association for audio-visual speech recognition. in Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5807 LNCS, pp. 133-144, 11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009, Bordeaux, France, 09/9/28. https://doi.org/10.1007/978-3-642-04697-1_13

Two-level bimodal association for audio-visual speech recognition. / Lee, Jong Seok; Ebrahimi, Touradj.

Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings. 2009. p. 133-144 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5807 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Two-level bimodal association for audio-visual speech recognition

AU - Lee, Jong Seok

AU - Ebrahimi, Touradj

PY - 2009/12/1

Y1 - 2009/12/1

N2 - This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

AB - This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

UR - http://www.scopus.com/inward/record.url?scp=70549102475&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70549102475&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-04697-1_13

DO - 10.1007/978-3-642-04697-1_13

M3 - Conference contribution

AN - SCOPUS:70549102475

SN - 3642046967

SN - 9783642046964

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 133

EP - 144

BT - Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings

ER -

Lee JS, Ebrahimi T. Two-level bimodal association for audio-visual speech recognition. In Advanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings. 2009. p. 133-144. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-04697-1_13