This paper proposes a temporal filtering technique used in extraction of visual features for improved robustness of automatic lip-reading, called visual-speech-pass filtering. A band-pass filter is applied to the pixel value sequence of the images containing the speaker's lip region to remove unwanted variations that are not relevant to the speech information. The filter is carefully designed based on psychological, spectral, and experimental analyses. Experimental results on two speaker-independent and one speaker-dependent recognition tasks demonstrate that the proposed technique significantly improves recognition performance in both clean and visually noisy conditions.
Bibliographical noteFunding Information:
This research was supported by the Ministry of Science, ICT & Future Planning (MSIP), Korea, in the ICT R&D Program 2013, and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the MSIP (No. 2013R1A1A1007822).
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Artificial Intelligence