The use of visual information of speech has been shown to be effective for compensating for performance degradation of acoustic speech recognition in noisy environments. However, visual noise is usually ignored in most of audio-visual speech recognition systems, while it can be included in visual speech signals during acquisition or transmission of the signals. In this paper, we present a new temporal filtering technique for extraction of noise-robust visual features. In the proposed method, a carefully designed band-pass filter is applied to the temporal pixel value sequences of lip region images in order to remove unwanted temporal variations due to visual noise, illumination conditions or speakers' appearances. We demonstrate that the method can improve not only visual speech recognition performance for clean and noisy images but also audio-visual speech recognition performance in both acoustically and visually noisy conditions.